Controlling Attention with Noise: The Cue-Combination Model of Visual Search

Controlling Attention With Noise: The Cue-Combination Model of Visual Search David F. Baldwin Michael C. Mozer College of Computer Science Northeastern University dfb@ccs.neu.edu Institute of Cognitive Science University of Colorado at Boulder mozer@colorado.edu measuring the response latency to detect the presence or absence of a target. Abstract Visual search is a ubiquitous human activity. Individ- uals can perform a remarkable range of tasks involv- ing search for a target object in a cluttered environ- ment with ease and efficiency. Wolfe (1994) proposed a model called Guided Search to explain how attention can be directed to locations containing task-relevant visual features. Despite its attractive qualities, the model is complex with many arbitrary assumptions, and heuris- tic mechanisms that have no formal justification. We propose a new variant of the Guided Search model that treats selection of task-relevant features for attentional guidance as a problem of cue combination: each visual feature serves as an unreliable cue to the location of the target, and cues from different features must be combined to direct attention to a target. Attentional control involves modulating the level of additive noise on individual feature maps, which affects their reliabil- ity as cues, which in turn affects their ability to draw attention. We show that our Cue-Combination Guided Search model obtains results commensurate with Wolfe’s Guided Search. Through its leverage of probabilistic formulations of optimal cue combination, the model achieves a degree of mathematical elegance and parsi- mony, and makes a novel claim concerning the compu- tational role of noise in attentional control. With a burgeoning experimental literature, models of visual search have been proposed to explain data within a mechanistic framework (e.g., Mozer, 1991; Sandon, 1990, Itti & Koch, 1998). Perhaps the most influential and thoroughly developed model is Guided Search 2.0 (Wolfe, 1994), which we’ll refer to as GS2.0. Guided Search has been refined further since (Wolfe & Gan- carz, 1996; Wolfe, 2001), but its essential claims have remained constant and have been used as a theoretical framework for explaining visual search data for over a decade. Figure 1 (left panel) shows a sketch of GS2.0. GS2.0, like most models of early visual processing, supposes that the visual scene is analyzed by indepen- dent retinotopic feature maps that detect the presence of primitive visual features across the retina along di- mensions such as color, orientation, and scale. The fea- ture maps represent each dimension via a coarse coding, i.e., the maps for a particular dimension—referred to as the channels—are highly overlapping and broadly tuned. GS2.0 characterizes the channels as encoding categorical features. For example, color has four channels represent- ing the salient color primitives—red, green, blue and yel- low; orientation also has four channels representing left, right, steep, and shallow slopes. Introduction Visual search is a ubiquitous human activity. We search for our keys on a cluttered desk, a familiar face in a crowd, an exit sign on the highway, our favorite brand of cereal at the supermarket, and so forth. That the human visual system can perform such a diverse variety of tasks is truly remarkable. The flexibility of the human visual system stems from the top-down control of attention, which allows for pro- cessing resources to be directed to task-relevant regions and objects in the visual field. How is attention di- rected based on an individual’s goals? To what sort of features of the visual environment can attention be di- rected? These two questions are central to the study of how humans interact with their environment. Visual search has traditionally been studied in the lab- oratory using cluttered stimulus displays containing ar- tificial objects. The objects are defined by a set of prim- itive visual features, such as color, shape, and size. For example, an experimental task might be to search for a red vertical line segment—the target—among green ver- ticals and red horizontals—the distractors. In a reaction- time paradigm, the difficulty of the task is assessed by The categorical channels are analyzed by a differenc- ing mechanism that enhances local contrast, yielding a bottom-up activation. Top-down control of attention takes place by emphasizing task-relevant channels—the set of channels, one per feature, that best distinguishes the target from its distractors. For example, given a red vertical target among green horizontal distractors, the red and vertical channels should be enhanced, yielding a top-down activation. The bottom-up and top-down activations from all channels are combined to form a saliency map in which activation at a location indicates the priority of that lo- cation for the task at hand. Attention is directed to loca- tions in order from most salient to least, and the object at each location is identified. The model supposes that response time is monotonically related to the number of locations that need to be searched before a target is found. (The model includes rules for terminating search if no target is found after a reasonable amount of effort.)

[1]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[2]  J. Wolfe,et al.  A Model of Visual Search Catches Up With Jay Enoch 40 Years Later , 1999 .

[3]  A. Yuille,et al.  Bayesian decision theory and psychophysics , 1996 .

[4]  Jochen Triesch,et al.  Democratic Integration: Self-Organized Integration of Adaptive Cues , 2001, Neural Computation.

[5]  Michael C. Mozer,et al.  Perception of multiple objects - a connectionist approach , 1991, Neural network modeling and connectionism.

[6]  R. Zemel,et al.  Inference and computation with population codes. , 2003, Annual review of neuroscience.

[7]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[8]  Jeremy M. Wolfe,et al.  Guided Search 3.0 , 1997 .

[9]  P. A. Sandon Simulating Visual Attention , 1990, Journal of Cognitive Neuroscience.

[10]  Mechanisms Lu,et al.  External Noise Distinguishes Attention , 1998 .

[11]  R. Jacobs What determines visual cue reliability? , 2002, Trends in Cognitive Sciences.