A Distributional Semantics Model for Idiom Detection - The Case of English and Russian

This paper describes experiments in English and Russian automatic idiom detection. Our algorithm is based on the idea that literal and idiomatic expressions appear in different contexts. This difference is captured by our distributional semantics model. We evaluate our model on both languages and compare its results. We show that our model is language-independent. We also describe a new annotated resource we created for our

[1]  Jing Peng,et al.  Experiments in Idiom Recognition , 2016, COLING.

[2]  Suzanne Stevenson,et al.  The VNC-Tokens Dataset , 2008 .

[3]  Caroline Sporleder,et al.  Using Gaussian Mixture Models to Detect Figurative Language in Context , 2010, NAACL.

[4]  Ari Rappoport,et al.  Multi-Word Expression Identification Using Sentence Surface Features , 2009, EMNLP.

[5]  Jing Peng,et al.  Classifying Idiomatic and Literal Expressions Using Vector Space Representations , 2015, RANLP.

[6]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[7]  Suresh Manandhar,et al.  An Empirical Study on Compositionality in Compound Nouns , 2011, IJCNLP.

[8]  Sophia Lubensky Russian-English dictionary of idioms , 2000 .

[9]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[10]  Xiaoyan Zhu,et al.  Measuring the Non-compositionality of Multiword Expressions , 2010, COLING.

[11]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[12]  John D. Kelleher,et al.  Idiom Token Classification using Sentential Distributed Semantics , 2016, ACL.

[13]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[14]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[15]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[16]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[17]  Caroline Sporleder,et al.  Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions , 2009, EACL.

[18]  Afsaneh Fazly,et al.  Pulling their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context , 2007 .

[19]  Timothy Baldwin,et al.  A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions , 2015, NAACL.

[20]  Ekaterina Vylomova,et al.  Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions , 2014, EMNLP.

[21]  Dominic Widdows,et al.  Automatic Extraction of Idioms using Graph Analysis and Asymmetric Lexicosyntactic Patterns , 2005, ACL 2005.

[22]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[23]  Jing Peng,et al.  In God We Trust. All Others Must Bring Data. - W. Edwards Deming. Using Word Embeddings to Recognize Idioms , 2016, SIMBig.

[24]  Meghdad Farahmand,et al.  Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions , 2015, EMNLP.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Carlos Ramisch,et al.  Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time , 2016, ACL.

[27]  Jing Peng,et al.  Idioms: Humans or Machines, It's All About Context , 2017, CICLing.

[28]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.