Probabilistic Independence Networks for Hidden Markov Probability Models

Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas, including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper presents a self-contained review of the basic principles of PINs. It is shown that the well-known forward-backward (F-B) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.

[1]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[2]  M. Degroot Optimal Statistical Decisions , 1970 .

[3]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[4]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[5]  Raymond D. Kent,et al.  Coarticulation in recent speech production models , 1977 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  I. Morgenstern,et al.  Magnetic correlations in two-dimensional spin-glasses , 1980 .

[9]  V. Isham An Introduction to Spatial Point Processes and Markov Random Fields , 1981 .

[10]  W. L. Mcmillan Monte Carlo simulation of the two-dimensional random ( ± J ) Ising model , 1983 .

[11]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[13]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[14]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[15]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[17]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Franz Josef Radermacher,et al.  Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Judea Pearl) , 1990, SIAM Rev..

[20]  Björn Lindblom,et al.  Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[21]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[22]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[23]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[24]  Geoffrey E. Hinton,et al.  Mean field networks that learn to discriminate temporally distorted strings , 1991 .

[25]  A. Dawid,et al.  Probabilistic expert systems and graphical modelling: a case study in drug safety , 1991, Philosophical Transactions of the Royal Society of London. Series A: Physical and Engineering Sciences.

[26]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[27]  Jun Zhang,et al.  A Markov Random Field Model-Based Approach to Image Interpretation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[29]  Chongguang Tao A generalization of discrete hidden Markov model and of viterbi algorithm , 1992, Pattern Recognit..

[30]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[31]  A. P. Dawid,et al.  Applications of a general propagation algorithm for probabilistic expert systems , 1992 .

[32]  Guy Marchal,et al.  Continuous Voxel Classification by Stochastic Relaxation: Theory and Application to MR Imaging and MR Angiography , 1993, IPMI.

[33]  Michael I. Jordan,et al.  Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: a pilot "motor equivalence" study. , 1993, The Journal of the Acoustical Society of America.

[34]  Paul E. Stolorz,et al.  Recursive approaches to the statistical physics of lattice proteins , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[35]  Michael I. Jordan,et al.  Boltzmann Chains and Hidden Markov Models , 1994, NIPS.

[36]  Ross D. Shachter,et al.  Global Conditioning for Probabilistic Inference in Belief Networks , 1994, UAI.

[37]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[38]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[39]  Guy Marchal,et al.  Continuous Voxel Classification by Stochastic Relaxation: Theory and Application to MR Imaging and MR Angiography , 1993, IPMI.

[40]  Yoshua Bengio,et al.  An EM approach to grammatical inference: input/output HMMs , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[41]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[42]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[43]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[44]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .

[45]  Helmut Lucke,et al.  Bayesian Belief Networks as a tool for stochastic parsing , 1995, Speech Commun..

[46]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[47]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[48]  Wray L. Buntine A Guide to the Literature on Learning Probabilistic Networks from Data , 1996, IEEE Trans. Knowl. Data Eng..

[49]  Padhraic Smyth,et al.  Belief networks, hidden Markov models, and Markov random fields: A unifying view , 1997, Pattern Recognit. Lett..

[50]  Bo Thiesson,et al.  Score and Information for Recursive Exponential Models with Incomplete Data , 1997, UAI.

[51]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[52]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..