On the learnability and usage of acyclic probabilistic finite automata

We propose and analyze a distribution learning algorithm for a subclass ofacyclic probalistic finite automata(APFA). This subclass is characterized by a certain distinguishability property of the automata's states. Though hardness results are known for learning distributions generated by general APFAs, we prove that our algorithm can efficiently learn distributions generated by the subclass of APFAs we consider. In particular, we show that the KL-divergence between the distribution generated by the target source and the distribution generated by our hypothesis can be made arbitrarily small with high confidence in polynomial time. We present two applications of our algorithm. In the first, we show how to model cursively written letters. The resulting models are part of a complete cursive handwriting recognition system. In the second application we demonstrate how APFAs can be used to build multiple-pronunciation models for spoken words. We evaluate the APFA-based pronunciation models on labeled speech data. The good performance (in terms of the log-likelihood obtained on test data) achieved by the APFAs and the little time needed for learning suggests that the learning algorithm of APFAs might be a powerful alternative to commonly used probabilistic models.

[1]  Francine R. Chen Identification of contextual factors for pronunciation networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  W. Hoeffding Probability inequalities for sum of bounded random variables , 1963 .

[3]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[4]  Neri Merhav,et al.  Maximum likelihood hidden Markov modeling using a dominant sequence of states , 1991, IEEE Trans. Signal Process..

[5]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[6]  Yoshua Bengio,et al.  Globally Trained Handwritten Word Recognizer Using Spatial Representation, Convolutional Neural Networks, and Hidden Markov Models , 1993, NIPS.

[7]  Dana Ron,et al.  Learning probabilistic automata with variable memory length , 1994, COLT '94.

[8]  Richard J. Lipton,et al.  Cryptographic Primitives Based on Hard Learning Problems , 1993, CRYPTO.

[9]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[10]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[11]  Michael Riley,et al.  A statistical model for generating pronunciation networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[13]  Ronitt Rubinfeld,et al.  Efficient learning of typical finite automata from random walks , 1993, STOC.

[14]  Lambert Schomaker,et al.  Computer Recognition and Human Production of Handwriting , 1989 .

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Manfred K. Warmuth,et al.  On the Computational Complexity of Approximating Distributions by Probabilistic Automata , 1990, COLT '90.

[17]  Boris A. Trakhtenbrot,et al.  Finite automata : behavior and synthesis , 1973 .

[18]  Ching Y. Suen,et al.  The State of the Art in Online Handwriting Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[20]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[21]  Réjean Plamondon,et al.  Computer processing of handwriting , 1990 .

[22]  Ching Y. Suen,et al.  Computer recognition and human production of handwriting , 1989 .