Switching State-Space Models

We introduce a statistical model for times series data with nonlinear dynamics which iteratively segments the data into regimes with approximately linear dynamics and learns the parameters of each of those regimes. This model combines and generalizes two of the most widely used stochastic time series models|the hidden Markov model and the linear dynamical system|and is related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network model (Jacobs et al., 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact Expectation Maximization (EM) alogithm cannot be applied. However, we present a variational approximation which maximizes a lower bound on the log likelihood and makes use of both the forward{backward recursions for hidden Markov models and the Kalman lter recursions for linear dynamical systems.

[1]  H. Rauch Solutions to the linear smoothing problem , 1963 .

[2]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3]  K. Ito,et al.  On State Estimation in Switching Environments , 1970 .

[4]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  M. Athans,et al.  State Estimation for Discrete Systems with Switching Parameters , 1978, IEEE Transactions on Aerospace and Electronic Systems.

[7]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[8]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[9]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[10]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[11]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[12]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[15]  James D. Hamilton A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle , 1989 .

[16]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[17]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[20]  Padhraic J. Smyth,et al.  Hidden Markov models for fault detection in dynamic systems , 1993 .

[21]  J. R. Rohlicek,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[22]  Steven J. Nowlan,et al.  Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.

[23]  Li Deng,et al.  A stochastic model of speech incorporating hierarchical nonstationarity , 1993, IEEE Trans. Speech Audio Process..

[24]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[25]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[26]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[28]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[29]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[30]  Naonori Ueda,et al.  Deterministic Annealing Variant of the EM Algorithm , 1994, NIPS.

[31]  Chang‐Jin Kim,et al.  Dynamic linear models with Markov-switching , 1994 .

[32]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[33]  Michael I. Jordan,et al.  Learning Fine Motion by Markov Mixtures of Experts , 1995, NIPS.

[34]  Visakan Kadirkamanathan,et al.  Recursive Estimation of Dynamic Modular RBF Networks , 1995, NIPS.

[35]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[36]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[37]  Michael Isard,et al.  Learning to Track the Visual Motion of Contours , 1995, Artif. Intell..

[38]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[39]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[40]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[41]  Rajesh P. N. Rao,et al.  A Class of Stochastic Models for Invariant Recognition, Motion, and Stereo , 1996 .

[42]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[43]  Zoubin Ghahramani Gatsby On Structured Variational Approximations , 1997 .

[44]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[45]  Joydeep Ghosh,et al.  A mixture-of-experts framework for adaptive Kalman filtering , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[46]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[47]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[48]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.