spatial-pool 3 conv 4 spatial-pool 4 Receptive Fields in the Brain

Humans can recognize objects in a way that is invariant to scale, translation, and clutter. We use invariance theory as a conceptual basis, to computationally model this phenomenon. This theory discusses the role of eccentricity in human visual processing, and is a generalization of feedforward convolutional neural networks (CNNs). Our model explains some key psychophysical observations relating to invariant perception, while maintaining important similarities with biological neural architectures. To our knowledge, this work is the first to unify explanations of all three types of invariance, all while leveraging the power and neurological grounding of CNNs.

[1]  D. Pelli,et al.  The uncrowded window of object recognition , 2008, Nature Neuroscience.

[2]  G. Kreiman,et al.  Timing, Timing, Timing: Fast Decoding of Object Information from Intracranial Field Potentials in Human Visual Cortex , 2009, Neuron.

[3]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[4]  J. O'Regan,et al.  Some results on translation invariance in the human visual system. , 1990, Spatial vision.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Thomas Serre,et al.  How Deep is the Feature Analysis underlying Rapid Visual Categorization? , 2016, NIPS.

[7]  Jos B. T. M. Roerdink,et al.  A Neurophysiologically Plausible Population Code Model for Feature Integration Explains Visual Crowding , 2010, PLoS Comput. Biol..

[8]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[9]  Anirvan S. Nandy,et al.  Saccade-confounded image statistics explain visual crowding , 2012, Nature Neuroscience.

[10]  C. Gross,et al.  Visuotopic organization and extent of V3 and V4 of the macaque , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[11]  H. BOUMA,et al.  Interaction Effects in Parafoveal Letter Recognition , 1970, Nature.

[12]  Tomaso A. Poggio,et al.  Computational role of eccentricity dependent cortical magnification , 2014, ArXiv.

[13]  R. Rosenholtz,et al.  A summary-statistic representation in peripheral vision explains visual crowding. , 2009, Journal of vision.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  P. Bex,et al.  A Unifying Model of Orientation Crowding in Peripheral Vision , 2015, Current Biology.

[16]  R. Rosenholtz,et al.  Pooling of continuous features provides a unifying account of crowding , 2016, Journal of vision.

[17]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[19]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[20]  D. Pelli,et al.  Crowding is unlike ordinary masking: distinguishing feature integration from detection. , 2004, Journal of vision.

[21]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[22]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[23]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[24]  C. Gross,et al.  Visual topography of V2 in the macaque , 1981, The Journal of comparative neurology.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[27]  I. Rentschler,et al.  Peripheral vision and pattern recognition: a review. , 2011, Journal of vision.

[28]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[29]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..

[30]  S. Edelman,et al.  Imperfect Invariance to Object Translation in the Discrimination of Complex Shapes , 2001, Perception.

[31]  D. Levi,et al.  Visual crowding: a fundamental limit on conscious perception and object recognition , 2011, Trends in Cognitive Sciences.

[32]  Francis Xinghang Chen,et al.  Modeling human vision using feedforward neural networks , 2016 .

[33]  D. Marr,et al.  Smallest channel in early human vision. , 1980, Journal of the Optical Society of America.

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[36]  C. Furmanski,et al.  Perceptual learning in object recognition: object specificity and size invariance , 2000, Vision Research.