Latent Variable Models

A powerful approach to probabilistic modelling involves supplementing a set of observed variables with additional latent, or hidden, variables. By defining a joint distribution over visible and latent variables, the corresponding distribution of the observed variables is then obtained by marginalization. This allows relatively complex distributions to be expressed in terms of more tractable joint distributions over the expanded variable space. One well-known example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. The structure of such probabilistic models can be made particularly transparent by giving them a graphical representation, usually in terms of a directed acyclic graph, or Bayesian network. In this chapter we provide an overview of latent variable models for representing continuous variables. We show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known technique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to non-linear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model temporal data.

[1]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[2]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[5]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[6]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[7]  D. Bartholomew Latent Variable Models And Factor Analysis , 1987 .

[8]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[9]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[10]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[11]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[13]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[14]  Calyampudi R. Rao Estimation and tests of significance in factor analysis , 1955 .

[15]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[16]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[17]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[18]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[19]  Geoffrey E. Hinton,et al.  GTM through time , 1997 .

[20]  Jacqueline J. Meulman,et al.  Multivariate Analysis, Part 1: Distributions, Ordination, and Inference, by W.J. Krzanowski and F.H.C. Marriott , 1998 .

[21]  C. Bishop,et al.  Analysis of multiphase flows using dual-energy gamma densitometry and neural networks , 1993 .

[22]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[23]  Christopher K. I. Williams,et al.  Magnification factors for the GTM algorithm , 1997 .

[24]  Geoffrey E. Hinton,et al.  Adaptive Elastic Models for Hand-Printed Character Recognition , 1991, NIPS.

[25]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.