Learning Representations that Support Extrapolation

Extrapolation -- the ability to make inferences that go beyond the scope of one's experiences -- is a hallmark of human intelligence. By contrast, the generalization exhibited by contemporary neural network algorithms is largely limited to interpolation between data points in their training corpora. In this paper, we consider the challenge of learning representations that support extrapolation. We introduce a novel visual analogy benchmark that allows the graded evaluation of extrapolation as a function of distance from the convex domain defined by the training data. We also introduce a simple technique, temporal context normalization, that encourages representations that emphasize the relations between objects. We find that this technique enables a significant improvement in the ability to extrapolate, considerably outperforming a number of competitive techniques.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Ying Zhang,et al.  Batch normalized recurrent neural networks , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[4]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[5]  Jason Yosinski,et al.  An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution , 2018, NeurIPS.

[6]  M. Botvinick,et al.  Neural representations of events arise from temporal community structure , 2013, Nature Neuroscience.

[7]  K. Holyoak,et al.  The Oxford handbook of thinking and reasoning , 2012 .

[8]  Jun Tani,et al.  Adaptive Detrending to Accelerate Convolutional Gated Recurrent Unit Training for Contextual Video Recognition , 2017, Neural Networks.

[9]  Dedre Gentner,et al.  Structure-Mapping: A Theoretical Framework for Analogy , 1983, Cogn. Sci..

[10]  K. Holyoak Analogy and Relational Reasoning , 2012 .

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  D. Rumelhart,et al.  A model for analogical reasoning. , 1973 .

[15]  Pushmeet Kohli,et al.  Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[16]  Jeffrey M. Zacks,et al.  Prediction Error Associated with the Perceptual Segmentation of Naturalistic Events , 2011, Journal of Cognitive Neuroscience.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[20]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[22]  J. Raven STANDARDIZATION OF PROGRESSIVE MATRICES, 1938 , 1941 .

[23]  Jeffrey M. Zacks,et al.  Event perception: a mind-brain perspective. , 2007, Psychological bulletin.

[24]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[25]  Carla P. Gomes,et al.  Understanding Batch Normalization , 2018, NeurIPS.

[26]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[27]  Felix Hill,et al.  Learning to Make Analogies by Contrasting Abstract Relational Structure , 2019, ICLR.

[28]  R. Feynman Symmetry in Physical Laws , 1966 .

[29]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[30]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[31]  N. Mukunda,et al.  The Character of Physical Law , 2018 .