论文信息 - Aligning Sentences from Standard Wikipedia to Simple Wikipedia - 字舞流文

Aligning Sentences from Standard Wikipedia to Simple Wikipedia

This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia. We introduce a method that improves over past efforts by using a greedy (vs. ordered) search over the document and a word-level semantic similarity score based on Wiktionary (vs. WordNet) that also accounts for structural similarity through syntactic dependencies. Experiments show improved performance on a hand-aligned set, with the largest gain coming from structural similarity. Resulting datasets of manually and automatically aligned sentence pairs are made available.

Wei Wu | Mari Ostendorf | Hannaneh Hajishirzi | William Hwang | Hannaneh Hajishirzi | Mari Ostendorf | Wei Wu | William Hwang

[1] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[2] Martha Palmer,et al. Verb Semantics and Lexical Selection , 1994, ACL.

[3] Duncan J. Watts,et al. Collective dynamics of ‘small-world’ networks , 1998, Nature.

[4] Chris Callison-Burch,et al. Bootstrapping Parallel Corpora , 2003, ParallelTexts@NAACL-HLT.

[5] Regina Barzilay,et al. Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[6] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[7] Pascale Fung,et al. Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E , 2004, EMNLP.

[8] Dániel Fogaras,et al. Scaling link-based similarity search , 2005, WWW '05.

[9] Dragos Stefan Munteanu,et al. Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[10] Christopher D. Manning,et al. Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[11] Stuart M. Shieber,et al. Towards Robust Context-Sensitive Sentence Alignment for Monolingual Corpora , 2006, EACL.

[12] Mari Ostendorf,et al. Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[13] Rada Mihalcea,et al. Text-to-Text Semantic Similarity for Automatic Short Answer Grading , 2009, EACL.

[14] Mari Ostendorf,et al. Analysis of vocabulary difficulty using Wiktionary , 2009, SLaTE.

[15] Iryna Gurevych,et al. A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[16] Wen-tau Yih,et al. Adaptive near-duplicate detection via similarity learning , 2010, SIGIR.

[17] Cristian Danescu-Niculescu-Mizil,et al. For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[18] Mark Dredze,et al. Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language , 2010, HLT-NAACL 2010.

[19] Kristina Toutanova,et al. Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment , 2010, NAACL.

[20] David Kauchak,et al. Simple English Wikipedia: A New Text Simplification Task , 2011, ACL.

[21] Mirella Lapata,et al. WikiSimple: Automatic Simplification of Wikipedia Articles , 2011, AAAI.

[22] Emiel Krahmer,et al. Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[23] Ali Farhadi,et al. Semantic Understanding of Professional Soccer Commentaries , 2012, UAI.

[24] Mari Ostendorf,et al. Graph-based algorithms for lexical semantics and its applications , 2012 .

[25] Weiwei Guo,et al. Modeling Sentences in the Latent Space , 2012, ACL.

[26] David Kauchak,et al. Improving Text Simplification Language Modeling Using Unsimplified Text Data , 2013, ACL.

[27] Luke S. Zettlemoyer,et al. Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves , 2013, EMNLP.

[28] Chris Callison-Burch,et al. PPDB: The Paraphrase Database , 2013, NAACL.

[29] Ali Farhadi,et al. Multi-Resolution Language Grounding with Weak Supervision , 2014, EMNLP.

[30] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[31] Oren Etzioni,et al. Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[32] Ali Farhadi,et al. Discriminative and consistent similarities in instance-level Multiple Instance Learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).