Learning probabilistic automata with variable memory length

We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Finite Suffix Automata. The learning algorithm is motivated by real applications in man-machine interaction such as hand-writing and speech recognition. Conventionally used fixed memory Markov and hidden Markov models have either severe practical or theoretical drawbacks. Though general hardness results are known for learning distributions generated by sources with similar structure, we prove that our algorithm can indeed efficiently learn distributions generated by our more restricted sources. In Particular, we show that the KL-divergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made small with high confidence in polynomial time and sample complexity. We demonstrate the applicability of our algorithm by learning the structure of natural English text and using our hypothesis for the correction of corrupted text.

[1]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[2]  A. Nadas,et al.  Estimation of probabilities in the language model of the IBM speech recognition system , 1984 .

[3]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[4]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[5]  Milena Mihail,et al.  Conductance and convergence of Markov chains-a combinatorial treatment of expanders , 1989, 30th Annual Symposium on Foundations of Computer Science.

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  J. A. Fill Eigenvalue bounds on convergence to stationarity for nonreversible markov chains , 1991 .

[9]  Abraham Lempel,et al.  A sequential algorithm for the universal coding of finite memory sources , 1992, IEEE Trans. Inf. Theory.

[10]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[11]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[12]  Ronitt Rubinfeld,et al.  Efficient learning of typical finite automata from random walks , 1993, STOC.

[13]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[14]  Hinrich Schütze,et al.  Part-of-Speech Tagging Using a Variable Memory Markov Model , 1994, ACL.

[15]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[16]  Meir Feder,et al.  A universal finite memory source , 1995, IEEE Trans. Inf. Theory.