Hierarchical Relational Inference

Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms of the complex behaviors they support. To address this, we propose a novel approach to physical reasoning that models objects as hierarchies of parts that may locally behave separately, but also act more globally as a single whole. Unlike prior approaches, our method learns in an unsupervised fashion directly from raw visual images to discover objects, parts, and their relations. It explicitly distinguishes multiple levels of abstraction and improves over a strong baseline at modeling synthetic and real-world videos.

[1]  Joelle Pineau,et al.  Spatially Invariant Unsupervised Object Detection with Convolutional Neural Networks , 2019, AAAI.

[2]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[3]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[4]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[6]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[7]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[8]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[9]  Jure Leskovec,et al.  Learning to Simulate Complex Physics with Graph Networks , 2020, ICML.

[10]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[12]  Raia Hadsell,et al.  Graph networks as learnable physics engines for inference and control , 2018, ICML.

[13]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[14]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[15]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[16]  Fei Deng,et al.  Generative Hierarchical Models for Parts, Objects, and Scenes , 2019, ArXiv.

[17]  Leonidas J. Guibas,et al.  Learning Shape Abstractions by Assembling Volumetric Primitives , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jiajun Wu,et al.  Entity Abstraction in Visual Model-Based Reinforcement Learning , 2019, CoRL.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[21]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[22]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[25]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[26]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[27]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[28]  Murray Shanahan,et al.  Reconciling deep learning with symbolic artificial intelligence: representing objects and relations , 2019, Current Opinion in Behavioral Sciences.

[29]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[30]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[31]  Alexander Lerchner,et al.  COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[32]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[33]  Alexander Lerchner,et al.  Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs , 2019, ArXiv.

[34]  Murray Shanahan,et al.  Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules , 2020, ICML.

[35]  Klaus Greff,et al.  A Perspective on Objects and Systematic Generalization in Model-Based RL , 2019, ArXiv.

[36]  Daniel L. K. Yamins,et al.  Flexible Neural Representation for Physics Prediction , 2018, NeurIPS.

[37]  Yedid Hoshen,et al.  VAIN: Attentional Multi-agent Predictive Modeling , 2017, NIPS.

[38]  Klaus Greff,et al.  On the Binding Problem in Artificial Neural Networks , 2020, ArXiv.

[39]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[40]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[41]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[42]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[43]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[44]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[45]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[46]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[47]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[48]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[49]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50]  Mario Fritz,et al.  To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction , 2016, ArXiv.

[51]  Zhijian Liu,et al.  Modeling Parts, Structure, and System Dynamics via Predictive Learning , 2019 .

[52]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[53]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[54]  Jürgen Schmidhuber,et al.  R-SQAIR: Relational Sequential Attend, Infer, Repeat , 2019, ArXiv.

[55]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[56]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.