Neural Expectation Maximization

Many real world tasks such as reasoning and physical interaction require identification and manipulation of conceptual entities. A first step towards solving these tasks is the automated discovery of distributed symbol-like representations. In this paper, we explicitly formalize this problem as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities. We evaluate our method on the (sequential) perceptual grouping task and find that it is able to accurately recover the constituent objects. We demonstrate that the learned representations are useful for next-step prediction.

[1]  P. Milner A model for visual shape recognition. , 1974, Psychological review.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[4]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[5]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[6]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[7]  J. Urgen Schmidhuber Learning to Control Fast-weight Memories: an Alternative to Dynamic Recurrent Networks , 1991 .

[8]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[9]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[10]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[11]  DeLiang Wang,et al.  Locally excitatory globally inhibitory oscillator networks , 1995, IEEE Transactions on Neural Networks.

[12]  C. Malsburg Binding in models of perception and brain function , 1995, Current Opinion in Neurobiology.

[13]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[14]  A. Treisman The binding problem , 1996, Current Opinion in Neurobiology.

[15]  Heiko Wersing,et al.  A Competitive-Layer Model for Feature Binding and Sensory Segmentation , 2001, Neural Computation.

[16]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[18]  Aapo Hyvärinen,et al.  Learning to Segment Any Random Vector , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[19]  Thomas Hofmann,et al.  Clustering appearance and shape by learning jigsaws , 2007 .

[20]  Eero P. Simoncelli,et al.  Image denoising using mixtures of Gaussian scale mixtures , 2008, 2008 15th IEEE International Conference on Image Processing.

[21]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[22]  A. Ravishankar Rao,et al.  Unsupervised Segmentation With Dynamical Units , 2008, IEEE Transactions on Neural Networks.

[23]  Marc Teboulle,et al.  A fast Iterative Shrinkage-Thresholding Algorithm with application to wavelet-based image deblurring , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[25]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[26]  A. Ravishankar Rao,et al.  An objective function utilizing complex sparsity for efficient segmentation in multi-layer oscillatory networks , 2010, Int. J. Intell. Comput. Cybern..

[27]  Nicolas Le Roux,et al.  Learning a Generative Model of Images by Factoring Appearance and Shape , 2011, Neural Computation.

[28]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[29]  Jonathan Le Roux,et al.  Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.

[30]  Thomas Serre,et al.  Neuronal Synchrony in Complex-Valued Deep Networks , 2013, ICLR.

[31]  Guillermo Sapiro,et al.  Learning Efficient Sparse and Low Rank Models , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Jürgen Schmidhuber,et al.  Binding via Reconstruction Clustering , 2015, ArXiv.

[33]  Edward H. Adelson,et al.  Learning visual groups from co-occurrences in space and time , 2015, ArXiv.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[36]  Harri Valpola,et al.  Tagger: Deep Unsupervised Perceptual Grouping , 2016, NIPS.

[37]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[38]  Trevor Darrell,et al.  Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Alexander Ilin,et al.  Recurrent Ladder Networks , 2017, NIPS.

[40]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[41]  Cordelia Schmid,et al.  SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.