On Invariance and Selectivity in Representation Learning

AbstractWe discuss data representation which can be learned automaticallyfrom data, are invariant to transformations, and at the same time selec-tive, in the sense that two points have the same representation only if theyare one the transformation of the other. The mathematical results heresharpen some of the key claims of i-theory { a recent theory of feedforwardprocessing in sensory cortex. [3, 4, 5].Keywords:Invariance, Machine LearningThe paper is submitted to Information and Inference Journal. 1 Introduction This paper considers the problem of learning "good" data representation whichcan lower the need of labeled data (sample complexity) in machine learning(ML). Indeed, while current ML systems have achieved impressive results in avariety of tasks, an obvious bottleneck appears to be the huge amount of labeleddata needed. This paper builds on the idea that data representation, which arelearned in an unsupervised manner, can be key to solve the problem. Classicalstatistical learning theory focuses on supervised learning and postulates that asuitable hypothesis space is given. In turn, under very general conditions, thelatter can be seen to be equivalent to a data representation. In other words,data representation and how to select and learn it, is classically not consideredto be part of the learning problem, but rather as a prior information. In practicead hoc solutions are often empirically found for each problem.The study in this paper is a step towards developing a theory of learningdata representation. Our starting point is the intuition that, since many learning

[1]  H. Wold,et al.  Some Theorems on Distribution Functions , 1936 .

[2]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[3]  D H HUBEL,et al.  RECEPTIVE FIELDS AND FUNCTIONAL ARCHITECTURE IN TWO NONSTRIATE VISUAL AREAS (18 AND 19) OF THE CAT. , 1965, Journal of neurophysiology.

[4]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[5]  S. Zacks The theory of statistical inference , 1972 .

[6]  M. Reed,et al.  Methods of Modern Mathematical Physics. 2. Fourier Analysis, Self-adjointness , 1975 .

[7]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[8]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[9]  P. Olver Equivalence, Invariants, and Symmetry: References , 1995 .

[10]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[11]  A. Ramm On the theory of reproducing kernel Hilbert spaces , 1998 .

[12]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[13]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[14]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[15]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[16]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[17]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[18]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[19]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[20]  Hans Burkhardt,et al.  Invariant kernel functions for pattern analysis and machine learning , 2007, Machine Learning.

[21]  T. Poggio,et al.  A model of V4 shape selectivity and invariance. , 2007, Journal of neurophysiology.

[22]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[23]  Jan Boman,et al.  Support Theorems for the Radon Transform and Cramér-Wold Theorems , 2008, 0802.4373.

[24]  Stefano Soatto,et al.  Actionable information in vision , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[26]  David P. Woodruff,et al.  Efficient Sketches for Earth-Mover Distance, with Applications , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[27]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[29]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[30]  Pascal Frossard,et al.  Dictionary learning: What is the right representation for my signal? , 2011 .

[31]  Joel Z. Leibo,et al.  Unsupervised Learning of Invariant Representations in Hierarchical Architectures , 2013, ArXiv.

[32]  Joel Z. Leibo,et al.  Does invariant recognition predict tuning of neurons in sensory cortex ? , 2013 .

[33]  Pedro M. Domingos,et al.  Deep Symmetry Networks , 2014, NIPS.

[34]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[35]  Tomaso Poggio,et al.  Representation Learning in Sensory Cortex: A Theory , 2014, IEEE Access.

[36]  Max Welling,et al.  Transformation Properties of Learned Visual Representations , 2014, ICLR.

[37]  Joel Z. Leibo,et al.  The Invariance Hypothesis Implies Domain-Specific Regions in Visual Cortex , 2014, bioRxiv.

[38]  Xu Chen,et al.  Deep Haar Scattering Networks , 2015, ArXiv.

[39]  Stefano Soatto,et al.  Modeling Visual Representations : Sufficiency , Minimality , Invariance and Deep Approximation , 2015 .

[40]  Stefano Soatto,et al.  Visual Scene Representations: Sufficiency, Minimality, Invariance and Deep Approximations , 2014, ICLR.

[41]  Joel Z. Leibo,et al.  Invariant Recognition Predicts Tuning of Neurons in Sensory Cortex , 2017 .