Documents Representation via Generalized Coupled Tensor Chain with the Rotation Group constraint

Continuous representations of linguistic structures are an important part of modern natural language processing systems. Despite the diversity, most of the existing log-multilinear embedding models are organized under vector operations. However, these operations can not precisely represent the compositionality of natural language due to a lack of order-preserving properties. In this work, we focus on one of the promising alternatives based on the embedding of documents and words in the rotation group through the generalization of the coupled tensor chain decomposition to the exponential family of the probability distributions. In this model, documents and words are represented as matrices, and n-grams representations are combined from word representations by matrix multiplication. The proposed model is optimized via noise-contrastive estimation. We show empirically that capturing word order and higher-order word interactions allows our model to achieve the best results in several document classification benchmarks.

[1]  Sanjeev Arora,et al.  A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[2]  Minmin Chen,et al.  Efficient Vector Representation for Documents through Corruption , 2017, ICLR.

[3]  Douwe Kiela,et al.  Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry , 2018, ICML.

[4]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[5]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[6]  Shuchin Aeron,et al.  Word Embeddings via Tensor Factorization , 2017, ArXiv.

[7]  Thierry Poibeau,et al.  A Tensor-based Factorization Model of Semantic Compositionality , 2013, NAACL.

[8]  Ali Taylan Cemgil,et al.  Generalised Coupled Tensor Factorisation , 2011, NIPS.

[9]  Sebastian Rudolph,et al.  Compositional Matrix-Space Models of Language , 2010, ACL.

[10]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[11]  Rong Ge,et al.  Understanding Composition of Word Embeddings via Tensor Decomposition , 2019, ICLR.

[12]  Sebastian Rudolph,et al.  Gradual Learning of Matrix-Space Models of Language for Sentiment Analysis , 2017, Rep4NLP@ACL.

[13]  Rotem Dror,et al.  The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing , 2018, ACL.

[14]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[17]  Zhuang Ma,et al.  Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency , 2018, EMNLP.

[18]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[21]  Yu Meng,et al.  Spherical Text Embedding , 2019, NeurIPS.

[22]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[23]  HyvärinenAapo,et al.  Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics , 2012 .

[24]  Yulia Tsvetkov,et al.  Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs , 2018, ICLR.

[25]  Vatsal Sharan,et al.  Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use , 2017, ICML.

[26]  Andrew M. Dai,et al.  Embedding Text in Hyperbolic Spaces , 2018, TextGraphs@NAACL-HLT.

[27]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[28]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[29]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[30]  Yizhou Sun,et al.  On Sampling Strategies for Neural Network-based Collaborative Filtering , 2017, KDD.

[31]  Claire Cardie,et al.  Compositional Matrix-Space Models for Sentiment Analysis , 2011, EMNLP.

[32]  B. Khoromskij O(dlog N)-Quantics Approximation of N-d Tensors in High-Dimensional Numerical Modeling , 2011 .

[33]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[34]  D. Gabay Minimizing a differentiable function over a differential manifold , 1982 .

[35]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[36]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[37]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[38]  Sergey Pavlov,et al.  “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology , 2019, Open Engineering.

[39]  Ansgar Scherp,et al.  CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model , 2019, ICLR.

[40]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[41]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[42]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[43]  Frank Verstraete,et al.  Matrix product state representations , 2006, Quantum Inf. Comput..

[44]  Gary Bécigneul,et al.  Riemannian Adaptive Optimization Methods , 2018, ICLR.

[45]  Hao Wu,et al.  Long Document Classification From Local Word Glimpses via Recurrent Attention Learning , 2019, IEEE Access.

[46]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[47]  Masashi Sugiyama,et al.  Learning Efficient Tensor Representations with Ring-structured Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).