The Invariance Hypothesis Implies Domain-Specific Regions in Visual Cortex

Is visual cortex made up of general-purpose information processing machinery, or does it consist of a collection of specialized modules? If prior knowledge, acquired from learning a set of objects is only transferable to new objects that share properties with the old, then the recognition system’s optimal organization must be one containing specialized modules for different object classes. Our analysis starts from a premise we call the invariance hypothesis: that the computational goal of the ventral stream is to compute an invariant-to-transformations and discriminative signature for recognition. The key condition enabling approximate transfer of invariance without sacrificing discriminability turns out to be that the learned and novel objects transform similarly. This implies that the optimal recognition system must contain subsystems trained only with data from similarly-transforming objects and suggests a novel interpretation of domain-specific regions like the fusiform face area (FFA). Furthermore, we can define an index of transformation-compatibility, computable from videos, that can be combined with information about the statistics of natural vision to yield predictions for which object categories ought to have domain-specific regions in agreement with the available data. The result is a unifying account linking the large literature on view-based recognition with the wealth of experimental evidence concerning domain-specific regions.

[1]  Yann LeCun,et al.  Handwritten zip code recognition with multilayer networks , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[2]  M. Tarr,et al.  Do viewpoint-dependent mechanisms generalize across members of a class? , 1998, Cognition.

[3]  Russell A. Epstein,et al.  Scene Areas in Humans and Macaques , 2013, Neuron.

[4]  P. Downing,et al.  The role of occipitotemporal body-selective regions in person perception , 2011, Cognitive neuroscience.

[5]  J. Keenan,et al.  Lesions of the fusiform face area impair perception of facial configuration in prosopagnosia , 2002, Neurology.

[6]  Edmund T. Rolls,et al.  Deformation-specific and deformation-invariant visual object recognition: pose vs. identity recognition of people and deforming objects , 2014, Front. Comput. Neurosci..

[7]  H. Barlow Why have multiple cortical areas? , 1986, Vision Research.

[8]  R. Yin Looking at Upside-down Faces , 1969 .

[9]  Joel Z. Leibo,et al.  Can a biologically-plausible hierarchy effectively replace face detection, alignment, and recognition pipelines? , 2013, ArXiv.

[10]  N. Logothetis,et al.  fMRI of the Face-Processing Network in the Ventral Temporal Lobe of Awake and Anesthetized Macaques , 2011, Neuron.

[11]  H H Bülthoff,et al.  Psychophysical support for a two-dimensional view interpolation theory of object recognition. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  I. Gauthier,et al.  Computational approaches to the development of perceptual expertise , 2004, Trends in Cognitive Sciences.

[13]  D. Hubel,et al.  Uniformity of monkey striate cortex: A parallel relationship between field size, scatter, and magnification factor , 1974, The Journal of comparative neurology.

[14]  Kevan A. C. Martin,et al.  A Canonical Microcircuit for Neocortex , 1989, Neural Computation.

[15]  N. Kanwisher,et al.  How Distributed Is Visual Category Information in Human Occipito-Temporal Cortex? An fMRI Study , 2002, Neuron.

[16]  A. Oliva,et al.  A Real-World Size Organization of Object Responses in Occipitotemporal Cortex , 2012, Neuron.

[17]  M. Sigman,et al.  Opinion TRENDS in Cognitive Sciences Vol.9 No.7 July 2005 The neural code for written words: a proposal , 2022 .

[18]  Xueqi Cheng,et al.  A Network for Scene Processing in the Macaque Temporal Lobe , 2013, Neuron.

[19]  Elias B. Issa,et al.  Precedence of the Eye Region in Neural Processing of Faces , 2012, The Journal of Neuroscience.

[20]  Joel Z. Leibo,et al.  Why The Brain Separates Face Recognition From Object Recognition , 2011, NIPS.

[21]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[22]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[23]  Talma Hendler,et al.  Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas , 2002, Neuron.

[24]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[25]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[26]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[27]  Michael W. Spratling Learning viewpoint invariant perceptual representations from cluttered images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[29]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[30]  Dwight J. Kravitz,et al.  The ventral visual pathway: an expanded neural framework for the processing of object quality , 2013, Trends in Cognitive Sciences.

[31]  Joel Z. Leibo,et al.  Learning Generic Invariances in Object Recognition: Translation and Scale , 2010 .

[32]  Bevil R. Conway,et al.  Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex , 2013, Nature Neuroscience.

[33]  Garrison W. Cottrell,et al.  Organization of face and object recognition in modular neural network models , 1999, Neural Networks.

[34]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[35]  Philippe G Schyns,et al.  Diagnostic recognition: task constraints, object information, and their interactions , 1998, Cognition.

[36]  M. Tarr,et al.  Becoming a “Greeble” Expert: Exploring Mechanisms for Face Recognition , 1997, Vision Research.

[37]  D. Schacter,et al.  On the nature of medial temporal lobe contributions to the constructive simulation of future events , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[38]  Joel Z. Leibo,et al.  The invariance hypothesis and the ventral stream , 2014 .

[39]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[40]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[41]  J. Fodor,et al.  The Modularity of Mind: An Essay on Faculty Psychology , 1984 .

[42]  N. Logothetis,et al.  View-dependent object recognition by monkeys , 1994, Current Biology.

[43]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[44]  S. Dehaene,et al.  Cultural Recycling of Cortical Maps , 2007, Neuron.

[45]  M. Tarr,et al.  Activation of the middle fusiform 'face area' increases with expertise in recognizing novel objects , 1999, Nature Neuroscience.

[46]  M. Livingstone,et al.  Behavioral and Anatomical Consequences of Early versus Late Symbol Training in Macaques , 2012, Neuron.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  S. Ullman Aligning pictorial descriptions: An approach to object recognition , 1989, Cognition.

[49]  Santiago Ramón y Cajal,et al.  Texture of the Nervous System of Man and the Vertebrates , 2000, Springer Vienna.

[50]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[51]  Irving Biederman,et al.  One-shot viewpoint invariance in matching novel objects , 1999, Vision Research.

[52]  Tomaso A. Poggio,et al.  Neural tuning size is a key factor underlying holistic face processing , 2014, ArXiv.

[53]  Jack L. Gallant,et al.  A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain , 2012, Neuron.

[54]  K. Grill-Spector,et al.  Electrical Stimulation of Human Fusiform Face-Selective Regions Distorts Face Perception , 2012, The Journal of Neuroscience.

[55]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[56]  Lorenzo Rosasco,et al.  The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work). , 2012 .

[57]  Talma Hendler,et al.  Center–periphery organization of human object areas , 2001, Nature Neuroscience.

[58]  Joel Z. Leibo,et al.  Learning invariant representations and applications to face verification , 2013, NIPS.

[59]  Heinrich H Bülthoff,et al.  Image-based object recognition in man, monkey and machine , 1998, Cognition.

[60]  Leslie G. Ungerleider,et al.  Distributed representation of objects in the human ventral visual pathway. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[62]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[63]  E. Marder Neuromodulation of Neuronal Circuits: Back to the Future , 2012, Neuron.

[64]  M. Bar,et al.  Scenes Unseen: The Parahippocampal Cortex Intrinsically Subserves Contextual Associations, Not Scenes or Places Per Se , 2008, The Journal of Neuroscience.

[65]  Thomas Serre,et al.  Component-based face detection , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[66]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  G. Mitchison Neuronal branching patterns and the economy of cortical wiring , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[68]  M. Farah,et al.  Parts and Wholes in Face Recognition , 1993, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[69]  Scott D. Slotnick,et al.  The Visual Word Form Area , 2013 .

[70]  Stefano Soatto,et al.  On the set of images modulo viewpoint and contrast changes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Joel Z. Leibo,et al.  Does invariant recognition predict tuning of neurons in sensory cortex ? , 2013 .

[72]  S Lehéricy,et al.  The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. , 2000, Brain : a journal of neurology.

[73]  Laurie S. Glezer,et al.  Evidence for Highly Selective Neuronal Tuning to Whole Words in the “Visual Word Form Area” , 2009, Neuron.

[74]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[75]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[76]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[77]  Joel Z. Leibo,et al.  Invariant Recognition Predicts Tuning of Neurons in Sensory Cortex , 2017 .

[78]  Tomaso Poggio,et al.  A hierarchical model of peripheral vision , 2011 .

[79]  M J Tarr,et al.  Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein (1993). , 1995, Journal of experimental psychology. Human perception and performance.

[80]  N. Kanwisher Domain specificity in face perception , 2000, Nature Neuroscience.

[81]  N. Kanwisher,et al.  The fusiform face area: a cortical region specialized for the perception of faces , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[82]  Joel Z. Leibo,et al.  Unsupervised learning of clutter-resistant visual representations from natural videos , 2014, ArXiv.

[83]  Tomaso Poggio,et al.  Faces as a "Model Category" for Visual Object Recognition , 2013 .

[84]  T Poggio,et al.  View-based models of 3D object recognition: invariance to imaging transformations. , 1995, Cerebral cortex.

[85]  Lorenzo Rosasco,et al.  Word-level invariant representations from acoustic waveforms , 2014, INTERSPEECH.

[86]  R. Malach,et al.  Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[87]  Bradford Z. Mahon,et al.  What drives the organization of object knowledge in the brain? , 2011, Trends in Cognitive Sciences.

[88]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[89]  Doris Y. Tsao,et al.  Faces and objects in macaque cerebral cortex , 2003, Nature Neuroscience.

[90]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[91]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[92]  Shimon Ullman,et al.  Visual Classification by a Hierarchy of Extended Fragments , 2006, Toward Category-Level Object Recognition.

[93]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[94]  Edmund T. Rolls,et al.  Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[95]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[96]  Doris Y. Tsao,et al.  Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[97]  Lorenzo Rosasco,et al.  Learning An Invariant Speech Representation , 2014, ArXiv.

[98]  N. Kanwisher,et al.  A Cortical Area Selective for Visual Processing of the Human Body , 2001, Science.

[99]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[100]  R. Malach,et al.  The topography of high-order human object areas , 2002, Trends in Cognitive Sciences.

[101]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[102]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[103]  D. Plaut,et al.  Complementary neural representations for faces and words: A computational exploration , 2011, Cognitive neuropsychology.

[104]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[105]  Michael J. Tarr Is human object recognition better described by geon structural description or by multiple views , 1995 .

[106]  Doris Y. Tsao,et al.  A Cortical Region Consisting Entirely of Face-Selective Cells , 2006, Science.

[107]  A. Young,et al.  Configurational Information in Face Perception , 1987, Perception.

[108]  Guy Wallis,et al.  Toward a unified model of face and object recognition in the human visual system , 2013, Front. Psychol..

[109]  Doris Y. Tsao,et al.  Patches with Links: A Unified System for Processing Faces in the Macaque Temporal Lobe , 2008, Science.

[110]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[111]  T Yamamoto,et al.  Selective impairment of facial recognition due to a haematoma restricted to the right fusiform and lateral occipital region , 2001, Journal of neurology, neurosurgery, and psychiatry.

[112]  D. Chklovskii,et al.  Maps in the brain: what can we learn from them? , 2004, Annual review of neuroscience.

[113]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[114]  N. Kanwisher Functional specificity in the human brain: A window into the functional architecture of the mind , 2010, Proceedings of the National Academy of Sciences.

[115]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[116]  S. Cajal Texture of the nervous system of man and the vertebrates , 2000 .

[117]  M. Tarr,et al.  FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[118]  Tomaso A. Poggio,et al.  Computational role of eccentricity dependent cortical magnification , 2014, ArXiv.

[119]  H. Bülthoff,et al.  Face recognition under varying poses: The role of texture and shape , 1996, Vision Research.

[120]  Isabel Gauthier,et al.  What constrains the organization of the ventral temporal cortex? , 2000, Trends in Cognitive Sciences.

[121]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[122]  J. Fodor The Modularity of mind. An essay on faculty psychology , 1986 .

[123]  Joel Z. Leibo,et al.  Learning and disrupting invariance in visual recognition with a temporal association rule , 2011, Front. Comput. Neurosci..

[124]  Edmund T. Rolls,et al.  Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet , 2012, Front. Comput. Neurosci..

[125]  P. Schyns,et al.  Information and viewpoint dependence in face recognition , 1997, Cognition.

[126]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? , 2014 .

[127]  Tomaso Poggio,et al.  Role of learning in three-dimensional form perception , 1996, Nature.

[128]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[129]  N. Kanwisher,et al.  Visual word processing and experiential origins of functional selectivity in human extrastriate cortex , 2007, Proceedings of the National Academy of Sciences.