Q-learning of sequential attention for visual object recognition from informative local descriptors

This work provides a framework for learning sequential attention in real-world visual object recognition, using an architecture of three processing stages. The first stage rejects irrelevant local descriptors based on an information theoretic saliency measure, providing candidates for foci of interest (FOI). The second stage investigates the information in the FOI using a codebook matcher and providing weak object hypotheses. The third stage integrates local information via shifts of attention, resulting in chains of descriptor-action pairs that characterize object discrimination. A Q-learner adapts then from explorative search and evaluative feedback from entropy decreases on the attention sequences, eventually prioritizing shifts that lead to a geometry of descriptor-action scanpaths that is highly discriminative with respect to object recognition. The methodology is successfully evaluated on indoors (COIL-20 database) and outdoors (TSG-20 database) imagery, demonstrating significant impact by learning, outperforming standard local descriptor based methods both in recognition accuracy and processing time.

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  M. Rothbart,et al.  Attention in early development , 1996 .

[3]  Francisco J. Vico,et al.  Residual Q-Learning Applied to Visual Attention , 1996, ICML.

[4]  L. Stark,et al.  Experimental metaphysics: The scanpath as an epistemological mechanism , 1996 .

[5]  Ronald A. Rensink,et al.  Author Notes , 1994, Schools of Thought.

[6]  I. Rybak,et al.  A model of attention-guided visual perception and recognition , 1998, Vision Research.

[7]  Bruce A. Draper,et al.  ADORE: Adaptive Object Recognition , 1999, ICVS.

[8]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[9]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[10]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[11]  Sridhar Mahadevan,et al.  A reinforcement learning model of selective visual attention , 2001, AGENTS '01.

[12]  S. Tipper,et al.  Long-Term Inhibition of Return of Attention , 2003, Psychological science.

[13]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  Gustavo Deco,et al.  The Computational Neuroscience ofVisual Cognition: Attention, Memory and Reward , 2004, WAPCV.

[16]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[17]  H. Deubel Localization of targets across saccades: Role of landmark objects , 2004 .

[18]  Horst Bischof,et al.  Rapid Object Recognition from Discriminative Regions of Interest , 2004, AAAI.

[19]  Lucas Paletta,et al.  Attention Architectures for Machine Vision and Mobile Robots , 2005 .

[20]  Manish Kumar,et al.  Building Detection from Mobile Imagery Using Informative SIFT Descriptors , 2005, SCIA.