The invariance hypothesis implies domain-specific regions in visual cortex

Is visual cortex made up of general-purpose information processing machinery, or does it consist of a collection of specialized modules? If prior knowledge, acquired from learning a set of objects is only transferable to new objects that share properties with the old, then the recognition system’s optimal organization must be one containing specialized modules for different object classes. Our analysis starts from a premise we call the invariance hypothesis: that the computational goal of the ventral stream is to compute an invariant-to-transformations and discriminative signature for recognition. The key condition enabling approximate transfer of invariance without sacrificing discriminability turns out to be that the learned and novel objects transform similarly. This implies that the optimal recognition system must contain subsystems trained only with data from similarly-transforming objects and suggests a novel interpretation of domain-specific regions like the fusiform face area (FFA). Furthermore, we can define an index of transformation-compatibility, computable from videos, that can be combined with information about the statistics of natural vision to yield predictions for which object categories ought to have domain-specific regions in agreement with the available data. The result is a unifying account linking the large literature on view-based recognition with the wealth of experimental evidence concerning domain-specific regions.

[1]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[2]  D. Hubel,et al.  Uniformity of monkey striate cortex: A parallel relationship between field size, scatter, and magnification factor , 1974, The Journal of comparative neurology.

[3]  Doris Y. Tsao,et al.  Faces and objects in macaque cerebral cortex , 2003, Nature Neuroscience.

[4]  Tomaso Poggio,et al.  A hierarchical model of peripheral vision , 2011 .

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[6]  Joel Z. Leibo,et al.  Unsupervised learning of clutter-resistant visual representations from natural videos , 2014, ArXiv.

[7]  T Poggio,et al.  View-based models of 3D object recognition: invariance to imaging transformations. , 1995, Cerebral cortex.

[8]  M. Livingstone,et al.  Behavioral and Anatomical Consequences of Early versus Late Symbol Training in Macaques , 2012, Neuron.

[9]  Stefano Soatto,et al.  On the set of images modulo viewpoint and contrast changes , 2009, CVPR.

[10]  Lorenzo Rosasco,et al.  Word-level invariant representations from acoustic waveforms , 2014, INTERSPEECH.

[11]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[12]  Joel Z. Leibo,et al.  Learning Generic Invariances in Object Recognition: Translation and Scale , 2010 .

[13]  Tomaso A. Poggio,et al.  Neural tuning size is a key factor underlying holistic face processing , 2014, ArXiv.

[14]  Russell A. Epstein,et al.  Scene Areas in Humans and Macaques , 2013, Neuron.

[15]  K. Grill-Spector,et al.  Electrical Stimulation of Human Fusiform Face-Selective Regions Distorts Face Perception , 2012, The Journal of Neuroscience.

[16]  P. Downing,et al.  The role of occipitotemporal body-selective regions in person perception , 2011, Cognitive neuroscience.

[17]  J. Keenan,et al.  Lesions of the fusiform face area impair perception of facial configuration in prosopagnosia , 2002, Neurology.

[18]  Michael J. Tarr Is human object recognition better described by geon structural description or by multiple views , 1995 .

[19]  S. Dehaene,et al.  Cultural Recycling of Cortical Maps , 2007, Neuron.

[20]  M. Tarr,et al.  Activation of the middle fusiform 'face area' increases with expertise in recognizing novel objects , 1999, Nature Neuroscience.

[21]  Xueqi Cheng,et al.  A Network for Scene Processing in the Macaque Temporal Lobe , 2013, Neuron.

[22]  Michael W. Spratling Learning viewpoint invariant perceptual representations from cluttered images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  R. Malach,et al.  Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[24]  S. Ullman Aligning pictorial descriptions: An approach to object recognition , 1989, Cognition.

[25]  M. Farah,et al.  Parts and Wholes in Face Recognition , 1993, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[26]  H. Barlow Why have multiple cortical areas? , 1986, Vision Research.

[27]  R. Yin Looking at Upside-down Faces , 1969 .

[28]  Joel Z. Leibo,et al.  Can a biologically-plausible hierarchy effectively replace face detection, alignment, and recognition pipelines? , 2013, ArXiv.

[29]  Bradford Z. Mahon,et al.  What drives the organization of object knowledge in the brain? , 2011, Trends in Cognitive Sciences.

[30]  N. Logothetis,et al.  fMRI of the Face-Processing Network in the Ventral Temporal Lobe of Awake and Anesthetized Macaques , 2011, Neuron.

[31]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[32]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[33]  Thomas Serre,et al.  Component-based face detection , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34]  Garrison W. Cottrell,et al.  Organization of face and object recognition in modular neural network models , 1999, Neural Networks.

[35]  M. Bar,et al.  Scenes Unseen: The Parahippocampal Cortex Intrinsically Subserves Contextual Associations, Not Scenes or Places Per Se , 2008, The Journal of Neuroscience.

[36]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[37]  A. Young,et al.  Configurational Information in Face Perception , 1987, Perception.

[38]  Guy Wallis,et al.  Toward a unified model of face and object recognition in the human visual system , 2013, Front. Psychol..

[39]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Edmund T. Rolls,et al.  Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet , 2012, Front. Comput. Neurosci..

[41]  G. Mitchison Neuronal branching patterns and the economy of cortical wiring , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[42]  P. Schyns,et al.  Information and viewpoint dependence in face recognition , 1997, Cognition.

[43]  S Lehéricy,et al.  The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. , 2000, Brain : a journal of neurology.

[44]  R. Malach,et al.  The topography of high-order human object areas , 2002, Trends in Cognitive Sciences.

[45]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[46]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[47]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  N. Kanwisher Functional specificity in the human brain: A window into the functional architecture of the mind , 2010, Proceedings of the National Academy of Sciences.

[49]  Laurie S. Glezer,et al.  Evidence for Highly Selective Neuronal Tuning to Whole Words in the “Visual Word Form Area” , 2009, Neuron.

[50]  Doris Y. Tsao,et al.  Patches with Links: A Unified System for Processing Faces in the Macaque Temporal Lobe , 2008, Science.

[51]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[52]  T Yamamoto,et al.  Selective impairment of facial recognition due to a haematoma restricted to the right fusiform and lateral occipital region , 2001, Journal of neurology, neurosurgery, and psychiatry.

[53]  D. Chklovskii,et al.  Maps in the brain: what can we learn from them? , 2004, Annual review of neuroscience.

[54]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[55]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[56]  Dwight J. Kravitz,et al.  The ventral visual pathway: an expanded neural framework for the processing of object quality , 2013, Trends in Cognitive Sciences.

[57]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[58]  J. Fodor,et al.  The Modularity of Mind: An Essay on Faculty Psychology , 1984 .

[59]  Yann LeCun,et al.  Handwritten zip code recognition with multilayer networks , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[60]  M. Tarr,et al.  Do viewpoint-dependent mechanisms generalize across members of a class? , 1998, Cognition.

[61]  H H Bülthoff,et al.  Psychophysical support for a two-dimensional view interpolation theory of object recognition. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Irving Biederman,et al.  One-shot viewpoint invariance in matching novel objects , 1999, Vision Research.

[63]  Lorenzo Rosasco,et al.  The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work). , 2012 .

[64]  Talma Hendler,et al.  Center–periphery organization of human object areas , 2001, Nature Neuroscience.

[65]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[66]  Joel Z. Leibo,et al.  Learning and disrupting invariance in visual recognition with a temporal association rule , 2011, Front. Comput. Neurosci..

[67]  David Marr,et al.  Vision: A computational investigation into the human representation , 1983 .

[68]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[69]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[70]  S. Cajal Texture of the nervous system of man and the vertebrates , 2000 .

[71]  M. Tarr,et al.  FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[72]  Tomaso A. Poggio,et al.  Computational role of eccentricity dependent cortical magnification , 2014, ArXiv.

[73]  H. Bülthoff,et al.  Face recognition under varying poses: The role of texture and shape , 1996, Vision Research.

[74]  D. Plaut,et al.  Complementary neural representations for faces and words: A computational exploration , 2011, Cognitive neuropsychology.

[75]  Philippe G Schyns,et al.  Diagnostic recognition: task constraints, object information, and their interactions , 1998, Cognition.

[76]  M. Tarr,et al.  Becoming a “Greeble” Expert: Exploring Mechanisms for Face Recognition , 1997, Vision Research.

[77]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[78]  D. Schacter,et al.  On the nature of medial temporal lobe contributions to the constructive simulation of future events , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[79]  Joel Z. Leibo,et al.  The invariance hypothesis and the ventral stream , 2014 .

[80]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[81]  Kevan A. C. Martin,et al.  A Canonical Microcircuit for Neocortex , 1989, Neural Computation.

[82]  A. Oliva,et al.  A Real-World Size Organization of Object Responses in Occipitotemporal Cortex , 2012, Neuron.

[83]  N. Logothetis,et al.  View-dependent object recognition by monkeys , 1994, Current Biology.

[84]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[85]  Jack L. Gallant,et al.  A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain , 2012, Neuron.

[86]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[87]  Isabel Gauthier,et al.  What constrains the organization of the ventral temporal cortex? , 2000, Trends in Cognitive Sciences.

[88]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[89]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[90]  J. Fodor The Modularity of mind. An essay on faculty psychology , 1986 .

[91]  Bevil R. Conway,et al.  Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex , 2013, Nature Neuroscience.

[92]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[93]  E. Marder Neuromodulation of Neuronal Circuits: Back to the Future , 2012, Neuron.

[94]  N. Kanwisher Domain specificity in face perception , 2000, Nature Neuroscience.

[95]  N. Kanwisher,et al.  The fusiform face area: a cortical region specialized for the perception of faces , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[96]  Edmund T. Rolls,et al.  Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[97]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[98]  Doris Y. Tsao,et al.  Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[99]  Lorenzo Rosasco,et al.  Learning An Invariant Speech Representation , 2014, ArXiv.

[100]  N. Kanwisher,et al.  A Cortical Area Selective for Visual Processing of the Human Body , 2001, Science.

[101]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[102]  Heinrich H Bülthoff,et al.  Image-based object recognition in man, monkey and machine , 1998, Cognition.

[103]  Leslie G. Ungerleider,et al.  Distributed representation of objects in the human ventral visual pathway. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[104]  N. Kanwisher,et al.  Visual word processing and experiential origins of functional selectivity in human extrastriate cortex , 2007, Proceedings of the National Academy of Sciences.

[105]  M. Sigman,et al.  Opinion TRENDS in Cognitive Sciences Vol.9 No.7 July 2005 The neural code for written words: a proposal , 2022 .

[106]  Elias B. Issa,et al.  Precedence of the Eye Region in Neural Processing of Faces , 2012, The Journal of Neuroscience.

[107]  Joel Z. Leibo,et al.  Why The Brain Separates Face Recognition From Object Recognition , 2011, NIPS.

[108]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[109]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[110]  Edmund T. Rolls,et al.  Deformation-specific and deformation-invariant visual object recognition: pose vs. identity recognition of people and deforming objects , 2014, Front. Comput. Neurosci..

[111]  I. Gauthier,et al.  Computational approaches to the development of perceptual expertise , 2004, Trends in Cognitive Sciences.

[112]  N. Kanwisher,et al.  How Distributed Is Visual Category Information in Human Occipito-Temporal Cortex? An fMRI Study , 2002, Neuron.

[113]  Joel Z. Leibo,et al.  Learning invariant representations and applications to face verification , 2013, NIPS.

[114]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[115]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[116]  Talma Hendler,et al.  Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas , 2002, Neuron.

[117]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[118]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[119]  Shimon Ullman,et al.  Visual Classification by a Hierarchy of Extended Fragments , 2006, Toward Category-Level Object Recognition.

[120]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[121]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? , 2014 .

[122]  Tomaso Poggio,et al.  Role of learning in three-dimensional form perception , 1996, Nature.