An Empirical Study of Smoothing Techniques for Language Modeling

We survey the most widely-used algorithms for smoothing models for language n -gram modeling. We then present an extensive empirical comparison of several of these smoothing techniques, including t...

[1]  L. M. M.-T. Theory of Probability , 1929, Nature.

[2]  W. E. Johnson I.—PROBABILITY: THE DEDUCTIVE AND INDUCTIVE PROBLEMS , 1932 .

[3]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[4]  M. McCarthy The statistical approach , 1959 .

[5]  W. Stewart Church , 1962, Encyclopedic Dictionary of Archaeology.

[6]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[7]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[8]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[9]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  A. Nadas,et al.  Estimation of probabilities in the language model of the IBM speech recognition system , 1984 .

[11]  Arthur Nádas,et al.  On Turing's formula for word probabilities , 1985, IEEE Trans. Acoust. Speech Signal Process..

[12]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13]  V. Rich Personal communication , 1989, Nature.

[14]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1989, ANLP.

[15]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[17]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[18]  Kenneth Ward Church,et al.  Estimation Procedures for Language Context: Poor Estimates are Worse than None , 1990 .

[19]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Kenneth Ward Church,et al.  A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[22]  Hermann Ney,et al.  On smoothing techniques for bigram-based natural language modelling , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[23]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[24]  Robert L. Mercer,et al.  An Estimate of an Upper Bound for the Entropy of English , 1992, CL.

[25]  Rohini K. Srihari,et al.  Combining Statistical and Syntactic Methods in Recognizing Handwritten Sentences , 1992 .

[26]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  J. Hull Combining Syntactic Knowledge and Visual Text Recognition: A Hidden Markov Model for Part of Speech Tagging In a Word Recognition Algorithm , 1992 .

[28]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[29]  Kenneth Ward Church,et al.  - 1-What ’ s Wrong with Adding One ? , 1994 .

[30]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[31]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[32]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[33]  Alex Waibel,et al.  The Janus Speech Recognizer , 1995 .

[34]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[35]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[36]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[37]  Ronald Rosenfeld,et al.  The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[38]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[39]  Reinhard Kneser,et al.  Statistical language modeling using a variable context length , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[40]  Ronald Rosenfeld,et al.  Scalable backoff language models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[41]  Stanley F. Chen,et al.  Building Probabilistic Models for Natural Language , 1996, ArXiv.

[42]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[43]  Joshua Goodman,et al.  Probabilistic Feature Grammars , 1997, IWPT.

[44]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[45]  Richard M. Stern,et al.  The 1996 Hub-4 Sphinx-3 System , 1997 .

[46]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[47]  Stanley F. Chen,et al.  Evaluation Metrics For Language Models , 1998 .

[48]  Eric Sven Ristad,et al.  A natural law of succession , 1995, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[49]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[50]  Stanley F. Chen,et al.  Language and Pronunciation Modeling in the CMU 1996 Hub 4 Evaluation , 1999 .

[51]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .