The Use of the Maximum Likelihood Criterion in Language Modelling

This paper gives an overview over the use of the maximum likelihood criterion in stochastic language modelling. This criterion and its associated estimation techniques provide a unifying framework for various approaches that seem very much unrelated and different at first glance, such as smoothing and cross-validation, decision trees (CART), word classes obtained by clustering, word trigger pairs and maximum entropy models.

[1]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[2]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[3]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[6]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Arthur Nádas,et al.  On Turing's formula for word probabilities , 1985, IEEE Trans. Acoust. Speech Signal Process..

[10]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[11]  Bernard Mérialdo,et al.  Natural Language Modeling for Phoneme-to-Text Transcription , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Frederick Jelinek,et al.  Classifying words for improved statistical language models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  Stephen E. Levinson,et al.  Adaptive acquisition of language , 1991 .

[17]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[18]  Frederick Jelinek,et al.  Basic Methods of Probabilistic Context Free Grammars , 1992 .

[19]  E. Levin,et al.  Learning how to understand language , 1993, EUROSPEECH.

[20]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[22]  Hermann Ney,et al.  Estimating 'small' probabilities by leaving-one-out , 1993, EUROSPEECH.

[23]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[24]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[25]  John D. Lafferty,et al.  Inference and Estimation of a Long-Range Trigram Model , 1994, ICGI.

[26]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[27]  Renato De Mori,et al.  Recent results in automatic learning rules for semantic interpretation , 1994, ICSLP.

[28]  Jonathan Yamron,et al.  The Automatic Component of the LINGSTAT Machine-Aided Translation System , 1994, HLT.

[29]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[30]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[31]  Hermann Ney,et al.  Distant bigram language modelling using maximum entropy , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Hermann Ney,et al.  Word Triggers and the EM Algorithm , 1997, CoNLL.