Representation Learning in Sensory Cortex: A Theory

We review and apply a computational theory of the feedforward path of the ventral stream in visual cortex based on the hypothesis that its main function is the encoding of invariant representations of images. A key justification of the theory is provided by a theorem linking invariant representations to small sample complexity for recognition that is, invariant representations allows learning from very few labeled examples. The theory characterizes how an algorithm that can be implemented by a set of ”simple” and ”complex” cells a ”HW module” – provides invariant and selective representations. The invariance can be learned in an unsupervised way from observed transformations. Theorems show that invariance implies several properties of the ventral stream organization, including the eccentricity dependent lattice of units in the retina and in V1, and the tuning of its neurons. The theory requires two stages of processing: the first, consisting of retinotopic visual areas such as V1, V2 and V4 with generic neuronal tuning, leads to representations that are invariant to translation and scaling; the second, consisting of modules in IT, with classand object-specific tuning, provides a representation for recognition with approximate invariance to class specific transformations, such as pose (of a body, of a face) and expression. In the theory the ventral stream main function is the unsupervised learning of ”good” representations that reduce the sample complexity of the final supervised learning stage. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF 1231216. Representation Learning in Sensory Cortex:

[1]  L. Meyers Evolution and learning: The Baldwin effect reconsidered , 2004 .

[2]  D. Pelli,et al.  The uncrowded window of object recognition , 2008, Nature Neuroscience.

[3]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[4]  J. Maunsell,et al.  Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position. , 2003, Journal of neurophysiology.

[5]  Andrew Y. Ng,et al.  Unsupervised learning models of primary cortical receptive fields and receptive field plasticity , 2011, NIPS.

[6]  Bruno A. Olshausen,et al.  An Unsupervised Algorithm For Learning Lie Group Transformations , 2010, ArXiv.

[7]  D. Ruderman The statistics of natural images , 1994 .

[8]  A. Heppes On the determination of probability distributions of more dimensions by their projections , 1956 .

[9]  S M Anstis,et al.  Letter: A chart demonstrating variations in acuity with retinal position. , 1974, Vision research.

[10]  D. Schacter,et al.  On the nature of medial temporal lobe contributions to the constructive simulation of future events , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11]  D. Levi Crowding—An essential bottleneck for object recognition: A mini-review , 2008, Vision Research.

[12]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[13]  W. M. Keck,et al.  Highly Selective Receptive Fields in Mouse Visual Cortex , 2008, The Journal of Neuroscience.

[14]  M. Potter Meaning in visual search. , 1975, Science.

[15]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[16]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[17]  E H Adelson,et al.  Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[18]  N. Kanwisher Functional specificity in the human brain: A window into the functional architecture of the mind , 2010, Proceedings of the National Academy of Sciences.

[19]  Shimon Ullman,et al.  Combined Top-Down/Bottom-Up Segmentation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[21]  Scott D. Slotnick,et al.  The Visual Word Form Area , 2013 .

[22]  D. C. Essen,et al.  Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[23]  D. Ringach Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. , 2002, Journal of neurophysiology.

[24]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[25]  Doris Y. Tsao,et al.  Faces and objects in macaque cerebral cortex , 2003, Nature Neuroscience.

[26]  C. Gross,et al.  Visuotopic organization and extent of V3 and V4 of the macaque , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[27]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[28]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[29]  J. Hegdé,et al.  Selectivity for Complex Shapes in Primate Visual Area V2 , 2000, The Journal of Neuroscience.

[30]  S. Nelson,et al.  Homeostatic plasticity in the developing nervous system , 2004, Nature Reviews Neuroscience.

[31]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[32]  Joel Z. Leibo,et al.  Why The Brain Separates Face Recognition From Object Recognition , 2011, NIPS.

[33]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[34]  H. BOUMA,et al.  Interaction Effects in Parafoveal Letter Recognition , 1970, Nature.

[35]  N. Kanwisher,et al.  A Cortical Area Selective for Visual Processing of the Human Body , 2001, Science.

[36]  D. Donoho,et al.  Uncertainty principles and signal recovery , 1989 .

[37]  Edmund T. Rolls,et al.  Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[38]  Tomaso A. Poggio,et al.  Computational role of eccentricity dependent cortical magnification , 2014, ArXiv.

[39]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[40]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[41]  R. Rosenholtz,et al.  A summary-statistic representation in peripheral vision explains visual crowding. , 2009, Journal of vision.

[42]  Doris Y. Tsao,et al.  A face feature space in the macaque temporal lobe , 2009, Nature Neuroscience.

[43]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[44]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Joel Z. Leibo,et al.  The Invariance Hypothesis Implies Domain-Specific Regions in Visual Cortex , 2014, bioRxiv.

[47]  Doris Y. Tsao,et al.  Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[48]  P. H. Schiller,et al.  Spatial frequency and orientation tuning dynamics in area V1 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Joel Z. Leibo,et al.  Learning invariant representations and applications to face verification , 2013, NIPS.

[50]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[51]  Joel Z. Leibo,et al.  Does invariant recognition predict tuning of neurons in sensory cortex ? , 2013 .

[52]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[53]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[54]  Charles F Stevens Preserving properties of object shape by computations in primary visual cortex. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[55]  M. Tarr,et al.  Becoming a “Greeble” Expert: Exploring Mechanisms for Face Recognition , 1997, Vision Research.

[56]  M. Tarr Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects , 1995, Psychonomic bulletin & review.

[57]  R. Vogels,et al.  Spatial sensitivity of macaque inferior temporal neurons , 2000, The Journal of comparative neurology.

[58]  M. Tarr,et al.  FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[59]  R. Malach,et al.  Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[60]  C. Gross,et al.  Visual topography of V2 in the macaque , 1981, The Journal of comparative neurology.

[61]  T. Poggio,et al.  Considerations on models of movement detection , 1973, Kybernetik.

[62]  Gerald Penn,et al.  Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[63]  Frédéric Gosselin,et al.  Diagnostic use of scale information for componential and holistic recognition. , 2003 .

[64]  Kunihiko Fukushima,et al.  Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[65]  Joel Z. Leibo,et al.  Neurons That Confuse Mirror-Symmetric Object Views , 2010 .

[66]  Tomaso Poggio,et al.  From Understanding Computation to Understanding Neural Circuitry , 1976 .

[67]  Tomaso A. Poggio,et al.  A Canonical Neural Circuit for Cortical Nonlinear Operations , 2008, Neural Computation.

[68]  Doris Y. Tsao,et al.  Mechanisms of face perception. , 2008, Annual review of neuroscience.

[69]  Tomaso Poggio,et al.  Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? , 2013, 1311.4158.

[70]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[71]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[72]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[73]  Stefano Soatto,et al.  Video-based descriptors for object recognition , 2011, Image Vis. Comput..

[74]  Tomaso Poggio,et al.  Models of object recognition , 2000, Nature Neuroscience.

[75]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[76]  M. Bar,et al.  Scenes Unseen: The Parahippocampal Cortex Intrinsically Subserves Contextual Associations, Not Scenes or Places Per Se , 2008, The Journal of Neuroscience.

[77]  I. Rentschler,et al.  Peripheral vision and pattern recognition: a review. , 2011, Journal of vision.

[78]  D. V. van Essen,et al.  Selectivity for polar, hyperbolic, and Cartesian gratings in macaque visual cortex. , 1993, Science.

[79]  D. Heeger Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[80]  N. Logothetis,et al.  View-dependent object recognition by monkeys , 1994, Current Biology.

[81]  O. L. Z. Book Review: The Organization of Behaviour: A Neuropsychological Theory , 1950 .

[82]  D. Levi,et al.  The effect of flankers on three tasks in central, peripheral, and amblyopic vision. , 2011, Journal of vision.

[83]  N. Logothetis,et al.  fMRI of the Face-Processing Network in the Ventral Temporal Lobe of Awake and Anesthetized Macaques , 2011, Neuron.

[84]  H H Bülthoff,et al.  Psychophysical support for a two-dimensional view interpolation theory of object recognition. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[85]  A. Cowey,et al.  Human cortical magnification factor and its relation to visual acuity , 2004, Experimental Brain Research.

[86]  D. Dennett The Baldwin Effect : A Crane , Not aSkyhook , 2008 .

[87]  D. Hubel,et al.  Uniformity of monkey striate cortex: A parallel relationship between field size, scatter, and magnification factor , 1974, The Journal of comparative neurology.

[88]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[89]  Joel Z. Leibo,et al.  How can cells in the anterior medial face patch be viewpoint invariant , 2011 .

[90]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[91]  Juha Karhunen,et al.  Stability of Oja's PCA Subspace Rule , 1994, Neural Computation.

[92]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[93]  D. Marr,et al.  Smallest channel in early human vision. , 1980, Journal of the Optical Society of America.

[94]  Nicolas Pinto,et al.  How far can you get with a modern face recognition test set using only simple features? , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  Yaser S. Abu-Mostafa,et al.  Hints and the VC Dimension , 1993, Neural Computation.

[96]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[97]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[98]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.

[99]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).