论文信息 - Representation Learning in Sensory Cortex: A Theory

Representation Learning in Sensory Cortex: A Theory

We review and apply a computational theory of the feedforward path of the ventral stream in visual cortex based on the hypothesis that its main function is the encoding of invariant representations of images. A key justification of the theory is provided by a theorem linking invariant representations to small sample complexity for recognition that is, invariant representations allows learning from very few labeled examples. The theory characterizes how an algorithm that can be implemented by a set of ”simple” and ”complex” cells a ”HW module” – provides invariant and selective representations. The invariance can be learned in an unsupervised way from observed transformations. Theorems show that invariance implies several properties of the ventral stream organization, including the eccentricity dependent lattice of units in the retina and in V1, and the tuning of its neurons. The theory requires two stages of processing: the first, consisting of retinotopic visual areas such as V1, V2 and V4 with generic neuronal tuning, leads to representations that are invariant to translation and scaling; the second, consisting of modules in IT, with classand object-specific tuning, provides a representation for recognition with approximate invariance to class specific transformations, such as pose (of a body, of a face) and expression. In the theory the ventral stream main function is the unsupervised learning of ”good” representations that reduce the sample complexity of the final supervised learning stage. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF 1231216. Representation Learning in Sensory Cortex:

Tomaso Poggio | Fabio Anselmi | T. Poggio | F. Anselmi

[1] Bartlett W. Mel. SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[2] W. M. Keck,et al. Highly Selective Receptive Fields in Mouse Visual Cortex , 2008, The Journal of Neuroscience.

[3] D. C. Essen,et al. Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[4] Joel Z. Leibo,et al. Why The Brain Separates Face Recognition From Object Recognition , 2011, NIPS.

[5] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7] T. Poggio,et al. A network that learns to recognize three-dimensional objects , 1990, Nature.

[8] Joel Z. Leibo,et al. Learning invariant representations and applications to face verification , 2013, NIPS.

[9] S M Anstis,et al. Letter: A chart demonstrating variations in acuity with retinal position. , 1974, Vision research.

[10] Yaser S. Abu-Mostafa,et al. Hints and the VC Dimension , 1993, Neural Computation.

[11] D. Hubel,et al. Uniformity of monkey striate cortex: A parallel relationship between field size, scatter, and magnification factor , 1974, The Journal of comparative neurology.

[12] H. BOUMA,et al. Interaction Effects in Parafoveal Letter Recognition , 1970, Nature.

[13] T. Poggio,et al. Considerations on models of movement detection , 1973, Kybernetik.

[14] Edmund T. Rolls,et al. Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[15] N. Logothetis,et al. Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[16] Nancy Kanwisher,et al. A cortical representation of the local visual environment , 1998, Nature.

[17] J. P. Jones,et al. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[18] M. Tarr,et al. FFA: a flexible fusiform area for subordinate-level visual processing automatized by expertise , 2000, Nature Neuroscience.

[19] Tomaso A. Poggio,et al. Computational role of eccentricity dependent cortical magnification , 2014, ArXiv.

[20] M. Tarr. Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects , 1995, Psychonomic bulletin & review.

[21] Terence D. Sanger,et al. Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[22] Tomaso Poggio,et al. Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[23] N. Logothetis,et al. View-dependent object recognition by monkeys , 1994, Current Biology.

[24] Tomaso Poggio,et al. Models of object recognition , 2000, Nature Neuroscience.

[25] Andrew Y. Ng,et al. Unsupervised learning models of primary cortical receptive fields and receptive field plasticity , 2011, NIPS.

[26] Frédéric Gosselin,et al. Diagnostic use of scale information for componential and holistic recognition. , 2003 .

[27] Doris Y. Tsao,et al. Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[28] Erkki Oja,et al. Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[29] M. Potter. Meaning in visual search. , 1975, Science.

[30] N. Kanwisher,et al. A Cortical Area Selective for Visual Processing of the Human Body , 2001, Science.

[31] David G. Lowe,et al. Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[33] Bruno A. Olshausen,et al. An Unsupervised Algorithm For Learning Lie Group Transformations , 2010, ArXiv.

[34] Stphane Mallat,et al. A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[35] N. Logothetis,et al. fMRI of the Face-Processing Network in the Ventral Temporal Lobe of Awake and Anesthetized Macaques , 2011, Neuron.

[36] Charles F Stevens. Preserving properties of object shape by computations in primary visual cortex. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37] Denis Fize,et al. Speed of processing in the human visual system , 1996, Nature.

[38] I. Rentschler,et al. Peripheral vision and pattern recognition: a review. , 2011, Journal of vision.

[39] Antonio Torralba,et al. Statistics of natural image categories , 2003, Network.

[40] D. Pelli,et al. The uncrowded window of object recognition , 2008, Nature Neuroscience.

[41] R. Vogels,et al. Spatial sensitivity of macaque inferior temporal neurons , 2000, The Journal of comparative neurology.

[42] Nicolas Pinto,et al. How far can you get with a modern face recognition test set using only simple features? , 2009, CVPR.

[43] J. Maunsell,et al. Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position. , 2003, Journal of neurophysiology.

[44] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[45] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[46] M. Tarr,et al. Becoming a “Greeble” Expert: Exploring Mechanisms for Face Recognition , 1997, Vision Research.

[47] D. Schacter,et al. On the nature of medial temporal lobe contributions to the constructive simulation of future events , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[48] R. Malach,et al. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[49] P. H. Schiller,et al. Spatial frequency and orientation tuning dynamics in area V1 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[50] Tomaso A. Poggio,et al. A Canonical Neural Circuit for Cortical Nonlinear Operations , 2008, Neural Computation.

[51] D. Marr,et al. Smallest channel in early human vision. , 1980, Journal of the Optical Society of America.

[52] Thomas Serre,et al. Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53] C. Gross,et al. Visual topography of V2 in the macaque , 1981, The Journal of comparative neurology.

[54] E. Oja. Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[55] D. Ruderman. The statistics of natural images , 1994 .

[56] R. Rosenholtz,et al. A summary statistic representation in peripheral vision explains visual search. , 2009, Journal of vision.

[57] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[58] E H Adelson,et al. Spatiotemporal energy models for the perception of motion. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[59] Doris Y. Tsao,et al. Faces and objects in macaque cerebral cortex , 2003, Nature Neuroscience.

[60] Thomas Serre,et al. A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[61] M. Bar,et al. Scenes Unseen: The Parahippocampal Cortex Intrinsically Subserves Contextual Associations, Not Scenes or Places Per Se , 2008, The Journal of Neuroscience.

[62] R. C. Tees. Review of The organization of behavior: A neuropsychological theory. , 2003 .

[63] Juha Karhunen,et al. Stability of Oja's PCA Subspace Rule , 1994, Neural Computation.

[64] D. V. van Essen,et al. Selectivity for polar, hyperbolic, and Cartesian gratings in macaque visual cortex. , 1993, Science.

[65] David L. Sheinberg,et al. Visual object recognition. , 1996, Annual review of neuroscience.

[66] A. Cowey,et al. Human cortical magnification factor and its relation to visual acuity , 2004, Experimental Brain Research.

[67] H H Bülthoff,et al. Psychophysical support for a two-dimensional view interpolation theory of object recognition. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[68] S Lehéricy,et al. The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. , 2000, Brain : a journal of neurology.

[69] Doris Y. Tsao,et al. Mechanisms of face perception. , 2008, Annual review of neuroscience.

[70] Stéphane Mallat,et al. Group Invariant Scattering , 2011, ArXiv.

[71] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[72] D. Heeger. Normalization of cell responses in cat striate cortex , 1992, Visual Neuroscience.

[73] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[74] Eero P. Simoncelli,et al. Metamers of the ventral stream , 2011, Nature Neuroscience.

[75] A. Heppes. On the determination of probability distributions of more dimensions by their projections , 1956 .

[76] D. Ringach. Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. , 2002, Journal of neurophysiology.

[77] S. Nelson,et al. Homeostatic plasticity in the developing nervous system , 2004, Nature Reviews Neuroscience.

[78] Dennis Gabor,et al. Theory of communication , 1946 .

[79] Ronen Basri,et al. Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[80] Peter Földiák,et al. Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[81] D. Levi. Crowding—An essential bottleneck for object recognition: A mini-review , 2008, Vision Research.

[82] Stefano Soatto,et al. Video-based descriptors for object recognition , 2011, Image Vis. Comput..

[83] David I. Perrett,et al. Neurophysiology of shape processing , 1993, Image Vis. Comput..

[84] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[85] D. Levi,et al. The effect of flankers on three tasks in central, peripheral, and amblyopic vision. , 2011, Journal of vision.

[86] C. Gross,et al. Visuotopic organization and extent of V3 and V4 of the macaque , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[87] Tomaso Poggio,et al. Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? , 2013, 1311.4158.

[88] J. Hegdé,et al. Selectivity for Complex Shapes in Primate Visual Area V2 , 2000, The Journal of Neuroscience.

[89] D. Donoho,et al. Uncertainty principles and signal recovery , 1989 .

[90] Doris Y. Tsao,et al. A face feature space in the macaque temporal lobe , 2009, Nature Neuroscience.