Investigating object compositionality in Generative Adversarial Networks

Deep generative models seek to recover the process with which the observed data was generated. They may be used to synthesize new samples or to subsequently extract representations. Successful approaches in the domain of images are driven by several core inductive biases. However, a bias to account for the compositional way in which humans structure a visual scene in terms of objects has frequently been overlooked. In this work, we investigate object compositionality as an inductive bias for Generative Adversarial Networks (GANs). We present a minimal modification of a standard generator to incorporate this inductive bias and find that it reliably learns to generate images as compositions of objects. Using this general design as a backbone, we then propose two useful extensions to incorporate dependencies among objects and background. We extensively evaluate our approach on several multi-object image datasets and highlight the merits of incorporating structure for representation learning purposes. In particular, we find that our structured GANs are better at generating multi-object images that are more faithful to the reference distribution. More so, we demonstrate how, by leveraging the structure of the learned generative process, one can 'invert' the learned generative model to perform unsupervised instance segmentation. On the challenging CLEVR dataset, it is shown how our approach is able to improve over other recent purely unsupervised object-centric approaches to image generation.

[1]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[2]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[3]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[4]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[6]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[7]  Nicolas Le Roux,et al.  Learning a Generative Model of Images by Factoring Appearance and Shape , 2011, Neural Computation.

[8]  Byoung-Tak Zhang,et al.  Generating Images Part by Part with Composite Generative Adversarial Networks , 2016, ArXiv.

[9]  Dhruv Batra,et al.  LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation , 2016, ICLR.

[10]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[11]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[12]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[13]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[14]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[15]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[16]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[17]  Hui Jiang,et al.  Generating images with recurrent adversarial networks , 2016, ArXiv.

[18]  P. Milner A model for visual shape recognition. , 1974, Psychological review.

[19]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[20]  Li Fei-Fei,et al.  Image Generation from Scene Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Scott W. Linderman,et al.  Learning Latent Permutations with Gumbel-Sinkhorn Networks , 2018, ICLR.

[22]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[23]  Ersin Yumer,et al.  ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Juergen Schmidhuber,et al.  Unsupervised Minimax: Adversarial Curiosity, Generative Adversarial Networks, and Predictability Minimization , 2019, ArXiv.

[25]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[26]  C. Spampinato,et al.  Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos , 2018, International Journal of Computer Vision.

[27]  Andrew Zisserman,et al.  Object Discovery with a Copy-Pasting GAN , 2019, ArXiv.

[28]  Sjoerd van Steenkiste,et al.  A Case for Object Compositionality in Deep Generative Models of Images , 2018, ArXiv.

[29]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Trevor Darrell,et al.  Compositional GAN: Learning Image-Conditional Binary Composition , 2018, International Journal of Computer Vision.

[32]  Ludovic Denoyer,et al.  Unsupervised Object Segmentation by Redrawing , 2019, NeurIPS.

[33]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[34]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Bo Zhang,et al.  Deep Structured Generative Models , 2018, ArXiv.

[37]  Adam Bielski,et al.  Emergence of Object Segmentation in Perturbed Generative Models , 2019, NeurIPS.

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[40]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[41]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[42]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[43]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[44]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[45]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[46]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[48]  Xiaohua Zhai,et al.  The GAN Landscape: Losses, Architectures, Regularization, and Normalization , 2018, ArXiv.

[49]  Stefan Wermter,et al.  Generating Multiple Objects at Spatially Distinct Locations , 2019, ICLR.

[50]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[51]  Diederik P. Kingma,et al.  Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .

[52]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[54]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[55]  Xiaohua Zhai,et al.  A Large-Scale Study on Regularization and Normalization in GANs , 2018, ICML.

[56]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[57]  Harri Valpola,et al.  Tagger: Deep Unsupervised Perceptual Grouping , 2016, NIPS.