|Task-driven salience: Directing Gaze for Visual Search|
Gary Cottrell, UCSD
Where we look is a decision our brains make 3 times a second – there is no decision we make in our lifetimes more often than this one. The reason we move our eyes so much is that we have a “foveated retina”: we have a very high resolution color vision spot in the middle of our retina called the fovea, while our vision to the side of where we are fixating is surprisingly low resolution. So, the visual system is very efficient at directing its gaze at locations that are highly salient – a word used to describe the “interestingness” of a location in the world. Recently, cognitive scientists at the Temporal Dynamics of Learning Center led by Garrison Cottrell have devised an automatic way of directing gaze for visual search, as in a “Where’s Waldo” scenario. Cottrell’s team of graduate students: Christopher Kanan, Matthew Tong and former student Lingyun Zhang, all members of the Computer Science and Engineering Department at UCSD, in work that will be published in the journal Visual Cognition, recently devised a system that matches where people look when searching for paintings, mugs, or people in a large collection of images better than any previous system. Not surprisingly, this was accomplished by taking into account what the people were looking for, and devising a probabilistic system that tries to find places in the image that resemble that object. Interestingly, the resulting system often looks in places where there is no object, but where people also look for the object. Matching these “false alarms” was a nice validation of the approach, since the goal is to match what people actually do, even their mistakes! Further work will attempt to extend the system to perform more complex tasks than simple visual search.
Example of system performance. Left, a test image that subjects are supposed to find people in. Middle, a “heat map” of the top 25% of fixations made by human subjects. Right, the top 25% most salient areas in the image according to the salience algorithm. The difference between these last two panels is a rough measure of the research that remains to be done.