Class-Based n-gram Models of Natural Language

We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co-occurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.

[1]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[2]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[3]  B. Harshbarger An Introduction to Probability Theory and its Applications, Volume I , 1958 .

[4]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[5]  R. Gallager Information Theory and Reliable Communication , 1968 .

[6]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[7]  Frank E. Grubbs,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[8]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[11]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Lalit R. Bahl,et al.  Experiments with the Tangora 20,000 word speech recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[14]  J. Cocke,et al.  A Statistical Approach to Machine , 1990 .

[15]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..