Recognizing Images Using Fixations

Garrison Cottrell and Christopher Kanan

Humans acquire visual information serially using eye movements. High-resolution information is acquired in the foveal region of the retina, and lower-resolution information is provided in the retinal periphery. This requires that people look at relevant or interesting regions of a scene. This is in stark contrast to the predominant approach in computer vision, which processes images in their entirety. Christopher Kanan and Garrison Cottrell, scientists in the UCSD Department of Computer Science and Engineering, recently published a model in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) that acquires information serially in a manner similar to people. The model, called NIMBLE, learns features from natural images, which exhibit properties that are qualitatively similar to neurons found in early visual cortex. NIMBLE then uses simulated eye movements to acquire information. As information is acquired over time, the system becomes more confident of what it is looking at. NIMBLE looks at features that are statistically rare in the world, as they are more likely to be useful for discriminating among categories. Kanan and Cottrell found that their relatively simple approach performs as well as, or even better than, state-of-the-art methods in computer vision in object, face, and flower recognition tasks.

NIMBLE

NIMBLE

NIMBLE