Tagger: Deep Unsupervised Perceptual Grouping

We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features. Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task. By enriching the representations of a neural network, we enable it to group the representations of different objects in an iterative manner. By allowing the system to amortize the iterative inference of the groupings, we achieve very fast convergence. In contrast to many other recently proposed methods for addressing multi-object scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities. For multi-digit classification of very cluttered images that require texture segmentation, our method offers improved classification performance over convolutional networks despite being fully connected. Furthermore, we observe that our system greatly improves on the semi-supervised result of a baseline Ladder network on our dataset, indicating that segmentation can also improve sample efficiency.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Yann LeCun,et al.  Memoires associatives distribuees: Une comparaison (Distributed associative memories: A comparison) , 1987 .

[3]  Yann LeCun,et al.  Modeles connexionnistes de l'apprentissage , 1987 .

[4]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[5]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[6]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[7]  J. Schmidhuber Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[8]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[9]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[10]  C. Malsburg Binding in models of perception and brain function , 1995, Current Opinion in Neurobiology.

[11]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[12]  Sven Behnke Hebbian learning and competition in the neural abstraction pyramid , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[13]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[14]  Heiko Wersing,et al.  A Competitive-Layer Model for Feature Binding and Sensory Segmentation , 2001, Neural Computation.

[15]  Sven Behnke,et al.  Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid , 2001, Int. J. Comput. Intell. Appl..

[16]  Gustavo Deco,et al.  Biased Competition Mechanisms for Visual Attention in a Multimodular Neurodynamical System , 2001, Emergent Neural Computational Architectures Based on Neuroscience.

[17]  Sven Behnke Learning Iterative Binarization using Hierarchical Recurrent Networks , 2003 .

[18]  Richard S. Zemel,et al.  Learning Parts-Based Representations of Data , 2006, J. Mach. Learn. Res..

[19]  Aapo Hyvärinen,et al.  Learning to Segment Any Random Vector , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[20]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[21]  A. Ravishankar Rao,et al.  Unsupervised Segmentation With Dynamical Units , 2008, IEEE Transactions on Neural Networks.

[22]  Jaakko Särelä,et al.  Selective Attention Improves Learning , 2009, ICANN.

[23]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[24]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[25]  Nicolas Le Roux,et al.  Learning a Generative Model of Images by Factoring Appearance and Shape , 2011, Neural Computation.

[26]  Peggy Seriès,et al.  A Hierarchical Generative Model of Recurrent Object-Based Attention in the Visual Cortex , 2011, ICANN.

[27]  Geoffrey E. Hinton,et al.  Robust Boltzmann Machines for recognition and denoising , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Gourab Kundu,et al.  On Amortizing Inference Cost for Structured Prediction , 2012, EMNLP.

[29]  Helge J. Ritter,et al.  Perceptual grouping through competition in coupled oscillator networks , 2014, ESANN.

[30]  Honglak Lee,et al.  Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines , 2013, ICML.

[31]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[32]  Thomas Serre,et al.  Neuronal Synchrony in Complex-Valued Deep Networks , 2013, ICLR.

[33]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[34]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[35]  Jürgen Schmidhuber,et al.  Binding via Reconstruction Clustering , 2015, ArXiv.

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[38]  Kevin Murphy,et al.  Efficient inference in occlusion-aware generative models of images , 2015, ArXiv.

[39]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[42]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[43]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[44]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.