No Representation without Transformation

We extend the framework of variational autoencoders to represent transformations explicitly in the latent space. In the family of hierarchical graphical models that emerges, the latent space is populated by higher order objects that are inferred jointly with the latent representations they act on. To explicitly demonstrate the effect of these higher order objects, we show that the inferred latent transformations reflect interpretable properties in the observation space. Furthermore, the model is structured in such a way that in the absence of transformations, we can run inference and obtain generative capabilities comparable with standard variational autoencoders. Finally, utilizing the trained encoder, we outperform the baselines by a wide margin on a challenging out-of-distribution classification task.

[1]  Tomaso Poggio,et al.  Representation Learning in Sensory Cortex: A Theory , 2014, IEEE Access.

[2]  Ben Poole,et al.  Weakly Supervised Disentanglement with Guarantees , 2019, ICLR.

[3]  Aapo Hyvärinen,et al.  Variational Autoencoders and Nonlinear ICA: A Unifying Framework , 2019, AISTATS.

[4]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[5]  Yee Whye Teh,et al.  Probabilistic symmetry and invariant neural networks , 2019, J. Mach. Learn. Res..

[6]  Bernhard Schölkopf,et al.  Causality for Machine Learning , 2019, ArXiv.

[7]  Andrew Zisserman,et al.  Learning to Discover Novel Visual Categories via Deep Transfer Clustering , 2019 .

[8]  Aaron van den Oord,et al.  Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.

[9]  Ankush Gupta,et al.  Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[10]  Soren Hauberg,et al.  Explicit Disentanglement of Appearance and Perspective in Generative Models , 2019, NeurIPS.

[11]  Jonathan Tompson,et al.  Temporal Cycle-Consistency Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  David Filliat,et al.  Symmetry-Based Disentangled Representation Learning requires Interaction with Environments , 2019, NeurIPS.

[13]  David P. Wipf,et al.  Diagnosing and Enhancing VAE Models , 2019, ICLR.

[14]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[15]  Karol Gregor,et al.  Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[16]  David Pfau,et al.  Towards a Definition of Disentangled Representations , 2018, ArXiv.

[17]  Rémi Munos,et al.  Neural Predictive Belief Representations , 2018, ArXiv.

[18]  Nicola De Cao,et al.  Explorations in Homeomorphic Variational Auto-Encoding , 2018, ArXiv.

[19]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[20]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[21]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[22]  Søren Hauberg,et al.  Only Bayes should learn a manifold (on the estimation of differential geometric structure from data) , 2018, ArXiv.

[23]  Pascal Libuschewski,et al.  Group Equivariant Capsule Networks , 2018, NeurIPS.

[24]  Alex Graves,et al.  Associative Compression Networks for Representation Learning , 2018, ArXiv.

[25]  Padhraic Smyth,et al.  Learning Priors for Invariance , 2018, AISTATS.

[26]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[27]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[28]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[29]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[30]  Yoshua Bengio,et al.  Learning Independent Features with Adversarial Nets for Non-linear ICA , 2017, 1710.05050.

[31]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[32]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[33]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[34]  Stefano Ermon,et al.  Neural Variational Inference and Learning in Undirected Graphical Models , 2017, NIPS.

[35]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[36]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[37]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[38]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[39]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[40]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[41]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[42]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[43]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[44]  Tomaso Poggio,et al.  I-theory on depth vs width: hierarchical function composition , 2015 .

[45]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[46]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[47]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[48]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[49]  Max Welling,et al.  Transformation Properties of Learned Visual Representations , 2014, ICLR.

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Roland Memisevic,et al.  Modeling Deep Temporal Dependencies with Recurrent "Grammar Cells" , 2014, NIPS.

[53]  Emmanuel Kowalski,et al.  An Introduction to the Representation Theory of Groups , 2014 .

[54]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[55]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[56]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[57]  Thomas Serre,et al.  Models of visual cortex , 2013, Scholarpedia.

[58]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[59]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[60]  Roland Memisevic,et al.  On multi-view feature learning , 2012, ICML.

[61]  Berthold Schweizer,et al.  Probabilistic Metric Spaces , 2011 .

[62]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[63]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[64]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[65]  Geoffrey E. Hinton,et al.  Using matrices to model symbolic relationship , 2008, NIPS.

[66]  I. Kondor,et al.  Group theoretical methods in machine learning , 2008 .

[67]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[69]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[70]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[71]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[72]  Bruno A. Olshausen,et al.  A multiscale dynamic routing circuit for forming size- and position-invariant object representations , 1995, Journal of Computational Neuroscience.

[73]  John Shawe-Taylor,et al.  Representation Theory and Invariant Neural Networks , 1996, Discret. Appl. Math..

[74]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[75]  D C Van Essen,et al.  Shifter circuits: a computational strategy for dynamic aspects of visual processing. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[76]  David Marr,et al.  Vision: A computational investigation into the human representation , 1983 .

[77]  D. Robinson A Course in the Theory of Groups , 1982 .

[78]  Geoffrey E. Hinton A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.

[79]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[80]  E. Wigner Gruppentheorie und ihre Anwendung auf die Quantenmechanik der Atomspektren , 1931 .

[81]  H. Weyl Quantenmechanik und Gruppentheorie , 1927 .

[82]  Felix . Klein,et al.  Vergleichende Betrachtungen über neuere geometrische Forschungen , 1893 .