The Hierarchical Hidden Markov Model: Analysis and Applications

We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. We seek a systematic unsupervised approach to the modeling of such structures. By extending the standard Baum-Welch (forward-backward) algorithm, we derive an efficient procedure for estimating the model parameters from unlabeled data. We then use the trained model for automatic hierarchical parsing of observation sequences. We describe two applications of our model and its parameter estimation procedure. In the first application we show how to construct hierarchical models of natural English text. In these models different levels of the hierarchy correspond to structures on different length scales in the text. In the second application we demonstrate how HHMMs can be used to automatically identify repeated strokes that represent combination of letters in cursive handwriting.

[1]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[2]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Frederick Jelinek,et al.  Markov Source Modeling of Text Generation , 1985 .

[5]  Jorma Rissanen,et al.  Complexity of strings in the class of Markov sources , 1986, IEEE Trans. Inf. Theory.

[6]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[7]  Kin Hong Wong,et al.  Script recognition using hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Manfred K. Warmuth,et al.  On the Computational Complexity of Approximating Distributions by Probabilistic Automata , 1990, COLT '90.

[10]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[13]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[14]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[15]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[16]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Michael Sipser,et al.  Inference and minimization of hidden Markov chains , 1994, COLT '94.

[18]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[19]  Yoram Singer,et al.  Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions , 1996, NIPS.

[20]  Naftali Tishby,et al.  Hidden Markov modelling of simultaneously recorded cells in the associative cortex of behaving monkeys , 1997 .

[21]  J. Cleary,et al.  \self-organized Language Modeling for Speech Recognition". In , 1997 .

[22]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[23]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[24]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[25]  Yoram Singer,et al.  Dynamical encoding of cursive handwriting , 2004, Biological Cybernetics.

[26]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[27]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[28]  Y. Singer,et al.  The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length , 2005, Machine-mediated learning.