Tensor Product Generation Networks for Deep NLP Modeling

We present a new approach to the design of deep networks for natural language processing (NLP), based on the general technique of Tensor Product Representations (TPRs) for encoding and processing symbol structures in distributed neural networks. A network architecture --- the Tensor Product Generation Network (TPGN) --- is proposed which is capable in principle of carrying out TPR computation, but which uses unconstrained deep learning to design its internal representations. Instantiated in a model for image-caption generation, TPGN outperforms LSTM baselines when evaluated on the COCO dataset. The TPR-capable structure enables interpretation of internal representations and operations, which prove to contain considerable grammatical content. Our caption-generation model can be interpreted as generating sequences of grammatical categories and retrieving words by their categories from a plan encoded as a distributed representation.

[1]  William Ramsey THE HARMONIC MIND: FROM NEURAL COMPUTATION TO OPTIMALITY‐THEORETIC GRAMMAR—VOLUME 1: COGNITIVE ARCHITECTURE AND VOLUME 2: LINGUISTIC AND PHILOSOPHICAL IMPLICATIONS , 2009 .

[2]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[3]  Pyeong Whan Cho,et al.  Incremental parsing in a continuous dynamical system: sentence processing in Gradient Symbolic Computation , 2017 .

[4]  Li Deng,et al.  Attentive Tensor Product Learning for Language Generation and Grammar Parsing , 2018, ArXiv.

[5]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[9]  David Weiss,et al.  DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks , 2017, ArXiv.

[10]  Dan Klein,et al.  Deep Compositional Question Answering with Neural Module Networks , 2015, ArXiv.

[11]  Stephen Clark,et al.  Jointly learning sentence embeddings and syntax with unsupervised Tree-LSTMs , 2017, Natural Language Engineering.

[12]  P. Smolensky Symbolic functions from neural computation , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[13]  Li Deng,et al.  Deep Learning of Grammatically-Interpretable Representations Through Question-Answering , 2017, ArXiv.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Pratik Rane,et al.  Self-Critical Sequence Training for Image Captioning , 2018 .

[18]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[19]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[20]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Geoffrey Zweig,et al.  Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.

[24]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[25]  Xiaokui Shu,et al.  Natural Language Toolkit (NLTK) , 2010 .

[26]  Li Deng,et al.  Question-Answering with Grammatically-Interpretable Representations , 2017, AAAI.

[27]  Joe Pater The harmonic mind : from neural computation to optimality-theoretic grammar , 2009 .

[28]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Tao Mei,et al.  Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Xinlei Chen,et al.  Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Hanli Wang,et al.  Image captioning with deep LSTM based on sequential residual , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[32]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[33]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[34]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[36]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[37]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.