Learning Generic Invariances in Object Recognition: Translation and Scale

Invariance to various transformations is key to object recognition but existing definitions of invariance are somewhat confusing while discussions of invariance are often confused. In this report, we provide an operational definition of invariance by formally defining perceptual tasks as classification problems. The definition should be appropriate for physiology, psychophysics and computational modeling. For any specific object, invariance can be trivially “learned” by memorizing a sufficient number of example images of the transformed object. While our formal definition of invariance also covers such cases, this report focuses instead on invariance from very few images and mostly on invariances from one example. Image-plane invariances – such as translation, rotation and scaling – can be computed from a single image for any object. They are called generic since in principle they can be hardwired or learned (during development) for any object. In this perspective, we characterize the invariance range of a class of feedforward architectures for visual recognition that mimic the hierarchical organization of the ventral stream. We show that this class of models achieves essentially perfect translation and scaling invariance for novel images. In this architecture a new image is represented in terms of weights of ”templates” (e.g. “centers” or “basis functions”) at each level in the hierarchy. Such a representation inherits the invariance of each template, which is implemented through replication of the corresponding “simple” units across positions or scales and their “association” in a “complex” unit. We show simulations on real images that characterize the type and number of templates needed to support the invariant recognition of novel objects. We find that 1) the templates need not be visually similar to the target objects and that 2) a very small number of them is sufficient for good recognition. These somewhat surprising empirical results have intriguing implications for the learning of invariant recognition during the development of a biological organism, such as a human baby. In particular, we conjecture that invariance to translation and scale may be learned by the association – through temporal contiguity – of a small number of primal templates, that is patches extracted from the images of an object moving on the retina across positions and scales. The number of templates can later be augmented by bootstrapping mechanisms using the correspondence provided by the primal templates – without the need of temporal contiguity. This version replaces a preliminary CBCL paper which was cited as: Leibo et al. ”Invariant Recognition of Objects by Vision,” CBCL-291, November 2, 2010

[1]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[2]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[3]  D. B. Bender,et al.  Visual Receptive Fields of Neurons in Inferotemporal Cortex of the Monkey , 1969, Science.

[4]  David H. Foster,et al.  Visual Comparison of Rotated and Reflected Random-Dot Patterns as a Function of Their Positional Symmetry and Separation in the Field* , 1981 .

[5]  R. Desimone,et al.  Visual properties of neurons in area V4 of the macaque: sensitivity to stimulus form. , 1987, Journal of neurophysiology.

[6]  J. O'Regan,et al.  Some results on translation invariance in the human visual system. , 1990, Spatial vision.

[7]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[8]  T Poggio,et al.  Fast perceptual learning in visual hyperacuity. , 1991, Science.

[9]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[10]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[11]  Peter Földiák,et al.  Learning generalisation and localisation: Competition for stimulus type and receptive field , 1996, Neurocomputing.

[12]  H. Bülthoff,et al.  Face recognition under varying poses: The role of texture and shape , 1996, Vision Research.

[13]  M. Fahle,et al.  The role of visual field position in pattern–discrimination learning , 1997, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[15]  D. Johnston,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997 .

[16]  M. Fahle,et al.  Limited translation invariance of human visual pattern recognition , 1998, Perception & psychophysics.

[17]  H. Barrett,et al.  Objective assessment of image quality. III. ROC metrics, ideal observers, and likelihood-generating functions. , 1998, Journal of the Optical Society of America. A, Optics, image science, and vision.

[18]  G. Bi,et al.  Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type , 1998, The Journal of Neuroscience.

[19]  Shimon Ullman,et al.  Computation of pattern invariance in brain-like structures , 1999, Neural Networks.

[20]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[21]  Konrad P. Körding,et al.  Extracting Slow Subspaces from Natural Videos Leads to Complex Cells , 2001, ICANN.

[22]  H. Bülthoff,et al.  Effects of temporal association on recognition memory , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  S. Edelman,et al.  Imperfect Invariance to Object Translation in the Discrimination of Complex Shapes , 2001, Perception.

[24]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[25]  Edmund T. Rolls,et al.  Invariant Object Recognition in the Visual System with Novel Views of 3D Objects , 2002, Neural Computation.

[26]  J. Maunsell,et al.  Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position. , 2003, Journal of neurophysiology.

[27]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[28]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[29]  Michael W. Spratling Learning viewpoint invariant perceptual representations from cluttered images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  J. DiCarlo,et al.  'Breaking' position-invariant object recognition , 2005, Nature Neuroscience.

[31]  Matthew A. Kupinski,et al.  Objective Assessment of Image Quality , 2005 .

[32]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[33]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[34]  Stuart Geman,et al.  Invariance and selectivity in the ventral visual pathway , 2006, Journal of Physiology-Paris.

[35]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[36]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[37]  Thomas Serre,et al.  Learning complex cell invariance from natural videos: A plausibility proof , 2007 .

[38]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Tomaso Poggio,et al.  Trade-Off between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex , 2007, The Journal of Neuroscience.

[40]  C. Koch,et al.  Decoding visual inputs from multiple neurons in the human temporal lobe. , 2007, Journal of neurophysiology.

[41]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[42]  David J. Freedman,et al.  Dynamic population coding of category information in inferior temporal and prefrontal cortex. , 2008, Journal of neurophysiology.

[43]  S. Gerber,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008 .

[44]  Niko Wilbert,et al.  Invariant Object Recognition with Slow Feature Analysis , 2008, ICANN.

[45]  G. Kreiman,et al.  Timing, Timing, Timing: Fast Decoding of Object Information from Intracranial Field Potentials in Human Visual Cortex , 2009, Neuron.

[46]  Robbe L. T. Goris,et al.  Frontiers in Computational Neuroscience Computational Neuroscience Neural Representations That Support Invariant Object Recognition , 2022 .

[47]  David D. Cox,et al.  What response properties do individual neurons need to underlie position and clutter "invariant" object recognition? , 2009, Journal of neurophysiology.

[48]  J. DiCarlo,et al.  Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex , 2010, Neuron.