Cost-sensitive learning of top-down modulation for attentional control

A biologically-inspired model of visual attention known as basic saliency model is biased for object detection. It is possible to make this model faster by inhibiting computation of features or scales, which are less important for detection of an object. To this end, we revise this model by implementing a new scale-wise surround inhibition. Each feature channel and scale is associated with a weight and a processing cost. Then a global optimization algorithm is used to find a weight vector with maximum detection rate and minimum processing cost. This allows achieving maximum object detection rate for real time tasks when maximum processing time is limited. A heuristic is also proposed for learning top-down spatial attention control to further limit the saliency computation. Comparing over five objects, our approach has 85.4 and 92.2% average detection rates with and without cost, respectively, which are above 80% of the basic saliency model. Our approach has 33.3 average processing cost compared with 52 processing cost of the basic model. We achieved lower average hit numbers compared with NVT but slightly higher than VOCUS attentional systems.

[1]  I. Rybak,et al.  A model of attention-guided visual perception and recognition , 1998, Vision Research.

[2]  M. Posner,et al.  Orienting of Attention* , 1980, The Quarterly journal of experimental psychology.

[3]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[4]  J. Maunsell,et al.  Feature-based attention in visual cortex , 2006, Trends in Neurosciences.

[5]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[6]  Olac Fuentes,et al.  Color-Based Road Sign Detection and Tracking , 2007, ICIAR.

[7]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[8]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[9]  Zhaoping Li A saliency map in primary visual cortex , 2002, Trends in Cognitive Sciences.

[10]  Sei-Wang Chen,et al.  Road-sign detection and tracking , 2003, IEEE Trans. Veh. Technol..

[11]  Simone Frintrop,et al.  VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search , 2006, Lecture Notes in Computer Science.

[12]  K. Nakayama,et al.  Priming of pop-out: I. Role of features , 1994, Memory & cognition.

[13]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[14]  Gazzaniga,et al.  29 Short-Term Memory for the Rapid Deployment of Visual Attention , 2004 .

[15]  Jing J. Liang,et al.  Comprehensive learning particle swarm optimizer for global optimization of multimodal functions , 2006, IEEE Transactions on Evolutionary Computation.

[16]  Arturo de la Escalera,et al.  Traffic sign recognition and analysis for intelligent vehicles , 2003, Image Vis. Comput..

[17]  Pavel Pudil,et al.  Road sign classification using Laplace kernel classifier , 2000, Pattern Recognit. Lett..

[18]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[19]  Christof Koch,et al.  Feature combination strategies for saliency-based visual attention systems , 2001, J. Electronic Imaging.

[20]  K. Nakayama,et al.  Sustained and transient components of focal visual attention , 1989, Vision Research.

[21]  R. Klein,et al.  Inhibition of return , 2000, Trends in Cognitive Sciences.

[22]  N. Kanwisher,et al.  Objects, Attributes, and Visual Attention: Which, What, and Where , 1992 .

[23]  A. Zelinsky,et al.  Real-time radial symmetry for speed sign detection , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[24]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[25]  Dariu Gavrila,et al.  Traffic Sign Recognition Revisited , 1999, DAGM-Symposium.

[26]  M. Posner,et al.  Components of visual orienting , 1984 .

[27]  Laurent Itti,et al.  Applying computational tools to predict gaze direction in interactive visual environments , 2008, TAP.

[28]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  A. L. Yarbus Eye Movements During Perception of Complex Objects , 1967 .

[30]  Luis Moreno,et al.  Road traffic sign detection and classification , 1997, IEEE Trans. Ind. Electron..

[31]  Ken Nakayama,et al.  A primitive memory system for the deployment of transient attention , 2003, Perception & psychophysics.

[32]  E. Bizzi,et al.  The Cognitive Neurosciences , 1996 .