An integrated model of visual attention using shape-based features

Apart from helping shed some light on human perceptual mechanisms, modeling visual attention has important applications in computer vision. It has been shown to be useful in priming object detection [16, 29], pruning interest points [26], quantifying visual clutter [23] as well as predicting human eye movements [17]. Prior work has either relied on purely bottom-up approaches [11] or top-down schemes using simple low-level features [16, 19]. In this paper, we outline a top-down visual attention model based on shape-based features. The same shape-based representation is used to represent both the objects and the scenes that contain them. The spatial priors imposed by the scene and the feature priors imposed by the target object are combined in a Bayesian framework to generate a task-dependent saliency map. We show that our approach can predict the location of objects as well as match eye movements (92% overlap with human observers). We also show that the proposed approach performs better than existing bottom-up and top-down computational models.

[1]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, CVPR 2004.

[2]  J. Movshon,et al.  Linearity and Normalization in Simple Cells of the Macaque Primary Visual Cortex , 1997, The Journal of Neuroscience.

[3]  M. Posner,et al.  Components of visual orienting , 1984 .

[4]  Nuno Vasconcelos,et al.  Integrated learning of saliency, complex features, and object detectors from cluttered scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Rajesh P. N. Rao,et al.  Eye movements in iconic visual search , 2002, Vision Research.

[6]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[7]  J. Maunsell,et al.  Feature-based attention in visual cortex , 2006, Trends in Neurosciences.

[8]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[9]  Yuanzhen Li,et al.  Feature congestion: a measure of display clutter , 2005, CHI.

[10]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[11]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[14]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Kunihiko Fukushima,et al.  A neural network model for selective attention in visual pattern recognition , 1986, Biological Cybernetics.

[16]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[17]  John F. Kalaska,et al.  Computational neuroscience : theoretical insights into brain function , 2007 .

[18]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[20]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[21]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[22]  Jeremy M. Wolfe,et al.  Guided Search 4.0: Current Progress With a Model of Visual Search , 2007, Integrated Models of Cognitive Systems.

[23]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[24]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[25]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[26]  E. Miller,et al.  Response to Comment on "Top-Down Versus Bottom-Up Control of Attention in the Prefrontal and Posterior Parietal Cortices" , 2007, Science.

[27]  Nuno Vasconcelos,et al.  Bottom-up saliency is a discriminant process , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Stanley M. Bileschi,et al.  Street Scenes: towards scene understanding in still images , 2006 .

[29]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  N. Kanwisher,et al.  Visual attention: Insights from brain imaging , 2000, Nature Reviews Neuroscience.

[31]  Rajesh P. N. Rao,et al.  Bayesian Inference and Attentional Modulation in the Visual Cortex Correspondence and Requests for Reprints to Rajesh , 2005 .

[32]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[33]  M. A. Steinmetz,et al.  Posterior Parietal Cortex Automatically Encodes the Location of Salient Stimuli , 2005, The Journal of Neuroscience.

[34]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[35]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.