A Unifying Review of Linear Gaussian Models

Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.

[1]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[2]  R. E. Kalman,et al.  New Results in Linear Filtering and Prediction Theory , 1961 .

[3]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[4]  H. Rauch Solutions to the linear smoothing problem , 1963 .

[5]  C. Striebel,et al.  On the maximum likelihood estimates for linear dynamic systems , 1965 .

[6]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[7]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[8]  K. Jöreskog Some contributions to maximum likelihood factor analysis , 1967 .

[9]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[10]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[11]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[12]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[13]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[17]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[18]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[19]  Lennart Ljung,et al.  Theory and Practice of Recursive Identification , 1983 .

[20]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[21]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[23]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[24]  L. Sirovich Turbulence and the dynamics of coherent structures. II. Symmetries and transformations , 1987 .

[25]  L. Sirovich Turbulence and the dynamics of coherent structures. I. Coherent structures , 1987 .

[26]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[27]  Steven J. Nowlan,et al.  Maximum Likelihood Competitive Learning , 1989, NIPS.

[28]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[29]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[30]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[31]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[32]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[33]  J. R. Rohlicek,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[34]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[35]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[36]  Bernard Delyon Remarks on filtering of semi-markov data , 1993 .

[37]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[38]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[39]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[40]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[41]  Geoffrey E. Hinton,et al.  Recognizing Handwritten Digits Using Mixtures of Linear Models , 1994, NIPS.

[42]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[43]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[44]  C. J.,et al.  Maximum Likelihood and Covariant Algorithms for Independent Component Analysis , 1996 .

[45]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[46]  Barak A. Pearlmutter,et al.  Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA , 1996, NIPS.

[47]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[48]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[49]  Christopher M. Bishop,et al.  GTM: A Principled Alternative to the Self-Organizing Map , 1996, NIPS.

[50]  Geoffrey E. Hinton,et al.  Switching State-Space Models , 1996 .

[51]  Sam T. Roweis,et al.  EM Algorithms for PCA and Sensible PCA , 1997, NIPS 1997.

[52]  HintonDepartment,et al.  The EM Algorithm for Mixtures of Factor AnalyzersZoubin GhahramaniGeo , 1997 .

[53]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[54]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[55]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[56]  Eric Moulines,et al.  Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  Eric Bauer,et al.  Update Rules for Parameter Estimation in Bayesian Networks , 1997, UAI.

[58]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[59]  Lawrence K. Saul,et al.  Modeling Acoustic Correlations by Factor Analysis , 1997, NIPS.

[60]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[61]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[62]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[63]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[64]  Christoph E. Schreiner,et al.  Blind source separation and deconvolution: the dynamic component analysis algorithm , 1998 .

[65]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[66]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[67]  S. Roberts,et al.  Learning interaction dynamics with coupled hidden Markov models , 2000 .

[68]  Richard M. Everson,et al.  Inferring the eigenvalues of covariance matrices from limited, noisy data , 2000, IEEE Trans. Signal Process..

[69]  Peter Desain,et al.  On tempo tracking: Tempogram Representation and Kalman filtering , 2000, ICMC.

[70]  Michael I. Jordan,et al.  Mixtures of Probabilistic Principal Component Analyzers , 2001 .

[71]  Helge J. Ritter,et al.  Resolution-Based Complexity Control for Gaussian Mixture Models , 2001, Neural Computation.

[72]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .