A feedforward theory of visual cortex accounts for human performance in rapid categorization

2 Primates are remarkably good at recognizing objects in cluttered natural images. The level of performance of the primate visual system and its robustness to image variability have remained unchallenged by the best computer vision systems despite decades of engineering effort. We developed a new model of the feedforward path of the ventral stream in primate visual cortex that incorporates many anatomical and physiological constraints. Its key property – in addition to supervised learning from IT cortex to higher areas – is an unsupervised learning stage that creates from natural images a large generic dictionary of tuned units from V2 to IT useful for different recognition tasks. Remarkably, these model units exhibit tuning properties consistent with the known physiology of the main visual areas. Here we report that the model can predict both the level and the pattern of performance achieved by humans on a difficult animal vs. non-animal rapid categorization task. The high performance of a feedforward, hierarchical model compared with existing computer vision systems and with human vision supports a theoretical framework for understanding properties of single neurons and cortical areas in the context of high level visual functions while suggesting a novel architecture for computer vision systems. Object recognition in cortex is mediated by the ventral visual pathway running from primary visual cortex 1 , V1, through extrastriate visual areas V2 and V4 to inferotemporal cortex 2-4 , IT (comprising PIT and AIT), and then to prefrontal cortex (PFC) which is involved in linking perception to memory and action. It is well known that recognition is possible for scenes viewed in rapid visual presentation that do not allow sufficient time for eye movements 5-10 and in the near-absence of attention 11. The hypothesis that the basic processing of information is feedforward is supported most directly by the short times required for a selective response to appear in IT cells 12. Very recent data 13 convincingly show that the activity of small neuronal populations in monkey IT, over very short time intervals (as small as 12.5 ms) and only about 100 ms after stimulus onset, contains surprisingly accurate and robust information supporting a variety of recognition tasks. While this does not rule out the use of 3 local feedback loops within an area, it does suggest that a core hierarchical feedforward architecture may be a reasonable starting point for a theory of visual cortex, aiming to explain " …

[1]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[2]  D H HUBEL,et al.  RECEPTIVE FIELDS AND FUNCTIONAL ARCHITECTURE IN TWO NONSTRIATE VISUAL AREAS (18 AND 19) OF THE CAT. , 1965, Journal of neurophysiology.

[3]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[4]  S. Grossberg Contour Enhancement , Short Term Memory , and Constancies in Reverberating Neural Networks , 1973 .

[5]  P. Schiller,et al.  Quantitative studies of single-cell properties in monkey striate cortex. I. Spatiotemporal organization of receptive fields. , 1976, Journal of neurophysiology.

[6]  R. Desimone,et al.  Prestriate afferents to inferior temporal cortex: an HRP study , 1980, Brain Research.

[7]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[8]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[9]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[10]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[11]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[12]  Leslie G. Ungerleider,et al.  Pathways for motion analysis: Cortical connections of the medial superior temporal and fundus of the superior temporal visual areas in the macaque , 1990, The Journal of comparative neurology.

[13]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[14]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[15]  R. Desimone Face-Selective Cells in the Temporal Cortex of Monkeys , 1991, Journal of Cognitive Neuroscience.

[16]  D I Perrett,et al.  Organization and functions of cells responsive to faces in the temporal cortex. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17]  D. V. van Essen,et al.  Neuronal responses to static texture patterns in area V1 of the alert macaque monkey. , 1992, Journal of neurophysiology.

[18]  Leslie G. Ungerleider,et al.  The modular organization of projections from areas V1 and V2 to areas V4 and TEO in macaques , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[19]  D. Ruderman The statistics of natural images , 1994 .

[20]  M. Tovée Neuronal Processing: How fast is the speed of thought? , 1994, Current Biology.

[21]  M. Tovée,et al.  Processing speed in the cerebral cortex and the neurophysiology of visual masking , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  G Kovács,et al.  Cortical correlate of pattern backward masking. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[23]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[24]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[25]  D. C. Essen,et al.  Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[26]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[27]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[28]  H. Markram,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.

[29]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[30]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[31]  P. Goldman-Rakic,et al.  Areal segregation of face-processing neurons in prefrontal cortex. , 1997, Science.

[32]  J. Wolfe,et al.  Preattentive Object Files: Shapeless Bundles of Basic Features , 1997, Vision Research.

[33]  Leslie G. Ungerleider,et al.  Cortical projections of area V2 in the macaque. , 1997, Cerebral cortex.

[34]  Jean Bullier,et al.  The Timing of Information Transfer in the Visual System , 1997 .

[35]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[36]  Keiji Tanaka,et al.  Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. , 1998, Journal of neurophysiology.

[37]  E. Rolls,et al.  View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. , 1998, Cerebral cortex.

[38]  G. Bi,et al.  Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type , 1998, The Journal of Neuroscience.

[39]  R. Desimone,et al.  Competitive Mechanisms Subserve Attention in Macaque Areas V2 and V4 , 1999, The Journal of Neuroscience.

[40]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[41]  E Corthout,et al.  Timing of activity in early visual cortex as revealed by transcranial magnetic stimulation. , 1999, Neuroreport.

[42]  E. Rolls,et al.  The Neurophysiology of Backward Visual Masking: Information Analysis , 1999, Journal of Cognitive Neuroscience.

[43]  R. von der Heydt,et al.  Coding of Border Ownership in Monkey Visual Cortex , 2000, The Journal of Neuroscience.

[44]  Mark C. W. van Rossum,et al.  Stable Hebbian Learning from Spike Timing-Dependent Plasticity , 2000, The Journal of Neuroscience.

[45]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[46]  V. Lamme,et al.  The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[47]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[48]  C. Connor,et al.  Shape representation in area V4: position-specific tuning for boundary conformation. , 2001, Journal of neurophysiology.

[49]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  S. Thorpe,et al.  Seeking Categories in the Brain , 2001, Science.

[51]  Thomas Serre,et al.  Categorization by Learning and Combining Object Parts , 2001, NIPS.

[52]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[53]  P. Fldik,et al.  The Speed of Sight , 2001, Journal of Cognitive Neuroscience.

[54]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[55]  N. Sigala,et al.  Visual categorization shapes feature selectivity in the primate temporal cortex , 2002, Nature.

[56]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.

[57]  Thomas Serre,et al.  On the Role of Object-Specific Features for Real World Object Recognition in Biological Vision , 2002, Biologically Motivated Computer Vision.

[58]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Simon J. Thorpe,et al.  Ultra-Rapid Scene Categorization with a Wave of Spikes , 2002, Biologically Motivated Computer Vision.

[60]  David J. Freedman,et al.  Visual categorization and the primate prefrontal cortex: neurophysiology and behavior. , 2002, Journal of neurophysiology.

[61]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[62]  T. Gawne,et al.  Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. , 2002, Journal of neurophysiology.

[63]  H. Abarbanel,et al.  Dynamical model of long-term synaptic plasticity , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[64]  M. Behrmann,et al.  Impact of learning on representation of parts and wholes in monkey inferotemporal cortex , 2002, Nature Neuroscience.

[65]  G. Rousselet,et al.  Is it an animal? Is it a human face? Fast processing in upright and inverted natural scenes. , 2003, Journal of vision.

[66]  Y. Amit,et al.  An integrated network for invariant visual detection and recognition , 2003, Vision Research.

[67]  C. Koch,et al.  Visual Selective Behavior Can Be Triggered by a Feed-Forward Process , 2003, Journal of Cognitive Neuroscience.

[68]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[69]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[70]  David J. Freedman,et al.  A Comparison of Primate Prefrontal and Inferior Temporal Cortices during Visual Categorization , 2003, The Journal of Neuroscience.

[71]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[72]  Tomaso Poggio,et al.  Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. , 2004, Journal of neurophysiology.

[73]  M. Riesenhuber,et al.  Face processing in humans is compatible with a simple shape–based model of vision , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[74]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[75]  A. Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[76]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[77]  Tomaso Poggio,et al.  A New Biologically Motivated Framework for Robust Object Recognition , 2004 .

[78]  Brian Leung,et al.  Component-based Car Detection in Street Scene Images , 2004 .

[79]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[80]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[81]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[82]  Lior Wolf,et al.  A Unified System For Object Detection, Texture Recognition, and Context Analysis Based on the Standard Model Feature Set , 2005, BMVC.

[83]  S. Thorpe,et al.  The time course of visual processing: Backward masking and natural scene categorisation , 2005, Vision Research.

[84]  A. Treisman,et al.  Perception of objects in natural scenes: is it really attention free? , 2005, Journal of experimental psychology. Human perception and performance.

[85]  S. Thorpe,et al.  Spike times make sense , 2005, Trends in Neurosciences.

[86]  Shimon Ullman,et al.  Feature hierarchies for object classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[87]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[88]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[89]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[90]  Lior Wolf,et al.  Empirical Comparison between Hierarchical Fragments Based and Standard Model Based Object Recognition Systems , 2006 .

[91]  Simon J Thorpe,et al.  Animals roll around the clock: the rotation invariance of ultrarapid visual processing. , 2006, Journal of vision.

[92]  RussLL L. Ds Vnlos,et al.  SPATIAL FREQUENCY SELECTIVITY OF CELLS IN MACAQUE VISUAL CORTEX , 2022 .