Latent Variables, Topographic Mappings and Data Visualization

Most pattern recognition tasks, such as regression, classification and novelty detection, can be viewed in terms of probability density estimation. A powerful approach to probabilistic modelling is to represent the observed variables in terms of a number of hidden, or latent, variables. One well-known example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. In this paper we provide an overview of latent variable models, and we show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known technique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to non-linear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model temporal data.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  Calyampudi R. Rao Estimation and tests of significance in factor analysis , 1955 .

[3]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[4]  Michael E. Tipping,et al.  Mixtures of Principal Component Analysers , 1997 .

[5]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[6]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[7]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[8]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[9]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[10]  Geoffrey E. Hinton,et al.  GTM through time , 1997 .

[11]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[12]  Jacqueline J. Meulman,et al.  Multivariate Analysis, Part 1: Distributions, Ordination, and Inference, by W.J. Krzanowski and F.H.C. Marriott , 1998 .

[13]  C. Bishop,et al.  Analysis of multiphase flows using dual-energy gamma densitometry and neural networks , 1993 .

[14]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[15]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[16]  Geoffrey E. Hinton,et al.  Adaptive Elastic Models for Hand-Printed Character Recognition , 1991, NIPS.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[20]  Christopher K. I. Williams,et al.  Magnification factors for the GTM algorithm , 1997 .

[21]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[23]  D. Bartholomew Latent Variable Models And Factor Analysis , 1987 .