Memo No . 63 May 26 , 2017 Symmetry Regularization

The properties of a representation, such as smoothness, adaptability, generality, equivariance/invariance, depend on restrictions imposed during learning. In this paper, we propose using data symmetries, in the sense of equivalences under transformations, as a means for learning symmetryadapted representations, i.e., representations that are equivariant to transformations in the original space. We provide a sufficient condition to enforce the representation, for example the weights of a neural network layer or the atoms of a dictionary, to have a group structure and specifically the group structure in an unlabeled training set. By reducing the analysis of generic group symmetries to permutation symmetries, we devise an analytic expression for a regularization scheme and a permutation invariant metric on the representation space. Our work provides a proof of concept on why and how to learn equivariant representations, without explicit knowledge of the underlying symmetries in the data. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. Symmetry Regularization Symmetry Regularization Fabio Anselmi1,2∗ Georgios Evangelopoulos1∗, Lorenzo Rosasco1,2, Tomaso Poggio1,2 1: Center for Brains, Minds, and Machines — McGovern Institute for Brain Research at MIT, Cambridge, MA, USA 2: Laboratory for Computational and Statistical learning (LCSL)-Istituto Italiano di Tecnologia, Genova, Italy (* equal contribution) Abstract The properties of a representation, such as smoothness, adaptability, generality, equivariance/invariance, depend on restrictions imposed during learning. In this paper, we propose using data symmetries, in the sense of equivalences under transformations, as a means for learning symmetry-adapted representations, i.e., representations that are equivariant to transformations in the original space. We provide a sufficient condition to enforce the representation, for example the weights of a neural network layer or the atoms of a dictionary, to have a group structure and specifically the group structure in an unlabeled training set. By reducing the analysis of generic group symmetries to permutation symmetries, we devise an analytic expression for a regularization scheme and a permutation invariant metric on the representation space. Our work provides a proof of concept on why and how to learn equivariant representations, without explicit knowledge of the underlying symmetries in the data.The properties of a representation, such as smoothness, adaptability, generality, equivariance/invariance, depend on restrictions imposed during learning. In this paper, we propose using data symmetries, in the sense of equivalences under transformations, as a means for learning symmetry-adapted representations, i.e., representations that are equivariant to transformations in the original space. We provide a sufficient condition to enforce the representation, for example the weights of a neural network layer or the atoms of a dictionary, to have a group structure and specifically the group structure in an unlabeled training set. By reducing the analysis of generic group symmetries to permutation symmetries, we devise an analytic expression for a regularization scheme and a permutation invariant metric on the representation space. Our work provides a proof of concept on why and how to learn equivariant representations, without explicit knowledge of the underlying symmetries in the data.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  Andrea Tacchetti,et al.  Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets , 2017, ArXiv.

[3]  Barnabás Póczos,et al.  Equivariance Through Parameter-Sharing , 2017, ICML.

[4]  Tomaso A. Poggio,et al.  When and Why Are Deep Networks Better Than Shallow Ones? , 2017, AAAI.

[5]  Max Welling,et al.  Steerable CNNs , 2016, ICLR.

[6]  Fabio Anselmi,et al.  Visual Cortex and Deep Networks: Learning Invariant Representations , 2016 .

[7]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..

[8]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[9]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[10]  Stéphane Mallat,et al.  Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Babak Hassibi,et al.  Group Frames With Few Distinct Inner Products and Low Coherence , 2015, IEEE Transactions on Signal Processing.

[13]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[14]  Lorenzo Rosasco,et al.  On Invariance and Selectivity in Representation Learning , 2015, ArXiv.

[15]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Lorenzo Rosasco,et al.  Discriminative template learning in group-convolutional networks for invariant speech representations , 2015, INTERSPEECH.

[19]  Andre Martins,et al.  Orbit Regularization , 2014, NIPS.

[20]  Pedro M. Domingos,et al.  Deep Symmetry Networks , 2014, NIPS.

[21]  Stefano Soatto,et al.  Visual Representations: Defining Properties and Deep Approximations , 2014, ICLR 2016.

[22]  László Tóth,et al.  Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Max Welling,et al.  Learning the Irreducible Representations of Commutative Lie Groups , 2014, ICML.

[24]  Learning to relate images. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[25]  Stéphane Mallat,et al.  Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Joan Bruna,et al.  Learning Stable Group Invariant Representations with Convolutional Networks , 2013, ICLR.

[27]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Honglak Lee,et al.  Learning Invariant Representations with Local Transformations , 2012, ICML.

[30]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Stefano Soatto,et al.  Steps Towards a Theory of Visual Information: Active Perception, Signal-to-Symbol Conversion and the Interplay Between Sensing and Control , 2011, ArXiv.

[32]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[33]  Christopher K. I. Williams,et al.  Transformation Equivariant Boltzmann Machines , 2011, ICANN.

[34]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[35]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[36]  Bruno A. Olshausen,et al.  Lie Group Transformation Models for Predictive Video Coding , 2011, 2011 Data Compression Conference.

[37]  Pascal Frossard,et al.  Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[38]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[39]  Bruno A. Olshausen,et al.  An Unsupervised Algorithm For Learning Lie Group Transformations , 2010, ArXiv.

[40]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  João M. F. Xavier,et al.  ANSIG—An analytic signature for permutation-invariant two-dimensional shape representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Calyampudi R. Rao Theory of Statistical Inference , 2008 .

[43]  Shayne Waldron,et al.  Tight frames generated by finite nonabelian groups , 2008, Numerical Algorithms.

[44]  B. D. Johnson,et al.  Frame potential and finite abelian groups , 2008, 0801.3813.

[45]  Rajesh P. N. Rao,et al.  Learning the Lie Groups of Visual Invariance , 2007, Neural Computation.

[46]  Shift-Invariance Sparse Coding for Audio Classification , 2007, UAI.

[47]  Hans Burkhardt,et al.  Invariant kernel functions for pattern analysis and machine learning , 2007, Machine Learning.

[48]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[50]  Pierre Vandergheynst,et al.  MoTIF: An Efficient Algorithm for Learning Translation Invariant Dictionaries , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[51]  Thomas Strohmer,et al.  GRASSMANNIAN FRAMES WITH APPLICATIONS TO CODING AND COMMUNICATION , 2003, math/0301135.

[52]  Yonina C. Eldar,et al.  Geometrically uniform frames , 2001, IEEE Trans. Inf. Theory.

[53]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[54]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[55]  Rajesh P. N. Rao,et al.  Learning Lie Groups for Invariant Visual Perception , 1998, NIPS.

[56]  Yaser S. Abu-Mostafa,et al.  Hints and the VC Dimension , 1993, Neural Computation.

[57]  Geoffrey E. Hinton,et al.  Learning symmetry groups with hidden units: beyond the perceptron , 1986 .

[58]  D. Slepian Group codes for the Gaussian channel , 1968 .

[59]  A. Cayley,et al.  VII. On the theory of groups as depending on the symbolic equation θn = 1.—Part III , 1854 .