Known unknowns: Learning novel concepts using reasoning-by-elimination

People can learn new visual concepts without any samples, from information given by language or by deductive reasoning. For instance, people can use elimination to infer the meaning of novel labels from their context. While recognizing novel concepts was intensively studied in zero-shot learning with semantic descriptions, training models to learn by elimination is much less studied. Here we describe the first approach to train an agent to reason-by-elimination, by providing instructions that contain both familiar concepts and unfamiliar ones ( “pick the red box and the green wambim”). In our framework, the agent combines a perception module with a reasoning module that includes internal memory. It uses reinforcement learning to construct a reasoning policy that, by considering all available items in a room, can make a correct inference even for never-seen objects or concepts. Furthermore, it can then perform one-shot learning and use newly learned concepts for inferring additional novel concepts. We evaluate this approach in a new set of environments, showing that agents successfully learn to reason by elimination, and can also learn novel concepts and use them for further reasoning. This approach paves the way to handle open-world environments by extending the abundant supervised learning approaches with reasoning frameworks that can handle novel concepts.

[1]  GetoorLise,et al.  Hinge-loss Markov random fields and probabilistic soft logic , 2017 .

[2]  Hoifung Poon,et al.  Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision , 2018, EMNLP.

[3]  Gal Chechik,et al.  Adaptive Confidence Smoothing for Generalized Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ian Hacking,et al.  A concise introduction to logic , 1972 .

[5]  S. Donovan The Sign of Four , 2011 .

[6]  J. Halberda,et al.  Is this a dax which I see before me? Use of the logical argument disjunctive syllogism supports word-learning in children and adults , 2006, Cognitive Psychology.

[7]  Stephen Clark,et al.  Grounded Language Learning Fast and Slow , 2021, ICLR.

[8]  E. Markman,et al.  Use of the mutual exclusivity assumption by young word learners , 2003, Cognitive Psychology.

[9]  Nicolò Cesana-Arlotti,et al.  Infants recruit logic to learn about the social world , 2020, Nature Communications.

[10]  Stephen Clark,et al.  Understanding Early Word Learning in Situated Artificial Agents , 2017 .

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Regina Barzilay,et al.  Representation Learning for Grounded Spatial Reasoning , 2017, TACL.

[13]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[15]  Chunhua Shen,et al.  REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Dan Klein,et al.  Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[18]  Marc'Aurelio Ranzato,et al.  Task-Driven Modular Networks for Zero-Shot Compositional Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[20]  Yongqin Xian,et al.  Open World Compositional Zero-Shot Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Lise Getoor,et al.  A short introduction to probabilistic soft logic , 2012, NIPS 2012.

[22]  Patricia A. Ganea,et al.  Toddlers' understanding and use of verbal negation in inferential reasoning search tasks. , 2019, Journal of experimental child psychology.

[23]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Susan Carey,et al.  The emergence of reasoning by the disjunctive syllogism in early childhood , 2016, Cognition.

[25]  Ross A. Knepper,et al.  Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following , 2020, CoRL.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Gal Chechik,et al.  A causal view of compositional zero-shot recognition , 2020, NeurIPS.

[28]  Shimon Whiteson,et al.  A Survey of Reinforcement Learning Informed by Natural Language , 2019, IJCAI.

[29]  Gal Chechik,et al.  Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning , 2018, UAI.

[30]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[31]  András Zempléni,et al.  Rapid learning of object names in dogs , 2021, Scientific Reports.

[32]  Matthew R. Walter,et al.  Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Christopher D. Manning,et al.  Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.

[34]  Ross A. Knepper,et al.  Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight , 2019, CoRL.

[35]  Martial Hebert,et al.  From Red Wine to Red Tomato: Composition with Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ellen M. Markman,et al.  Constraints Children Place on Word Meanings , 1990, Cogn. Sci..

[37]  Federico Tombari,et al.  Learning Graph Embeddings for Compositional Zero-shot Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jacob Krantz,et al.  Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments , 2020, ECCV.

[39]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[40]  Jeff Orkin,et al.  Learning Meanings of Words and Constructions, Grounded in a Virtual Game , 2010, KONVENS.

[41]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[42]  Matthew R. Walter,et al.  Approaching the Symbol Grounding Problem with Probabilistic Graphical Models , 2011, AI Mag..

[43]  Felix Duvallet,et al.  Imitation learning for natural language direction following through unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[44]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Kristen Grauman,et al.  Attributes as Operators , 2018, ECCV.

[46]  Xia Zhu,et al.  Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers , 2018, ECCV.

[47]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.