论文信息 - Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, called TP-Attention, which explicitly encodes the relations between each Transformer cell and the other cells from which values have been retrieved by attention. TP-Attention goes beyond linear combination of retrieved values, strengthening representation-building and resolving ambiguities introduced by multiple layers of standard attention. The TP-Transformer's attention maps give better insights into how it is capable of solving the Mathematics Dataset's challenging problems. Pretrained models and code will be made available after publication.

[1] Jürgen Schmidhuber,et al. Learning to Reason with Third-Order Tensor Products , 2018, NeurIPS.

[2] Paul Smolensky,et al. Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1990, Artif. Intell..

[3] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4] J. Schmidhuber. Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[5] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[6] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[7] A. G. Ivakhnenko,et al. Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[8] Klaus Greff,et al. A Perspective on Objects and Systematic Generalization in Model-Based RL , 2019, ArXiv.

[9] Pushmeet Kohli,et al. Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.

[10] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[11] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[12] Li Deng,et al. Deep Learning of Grammatically-Interpretable Representations Through Question-Answering , 2017, ArXiv.

[13] Virginia R. de Sa,et al. Learning Distributed Representations of Symbolic Structure Using Binding and Unbinding Operations , 2018, ArXiv.

[14] Guillaume Lample,et al. Augmenting Self-attention with Persistent Memory , 2019, ArXiv.

[15] Aaron C. Courville,et al. Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[16] Martin Wattenberg,et al. Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.

[17] F. Scarselli,et al. A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19] Razvan Pascanu,et al. Relational recurrent neural networks , 2018, NeurIPS.

[20] Li Deng,et al. Question-Answering with Grammatically-Interpretable Representations , 2017, AAAI.

[21] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.

[22] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[23] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.

[24] Christoph von der Malsburg,et al. The Correlation Theory of Brain Function , 1994 .

[25] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.

[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[27] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.