Mixtures of Probabilistic Principal Component Analyzers

Principal component analysis (PCA) is one of the most popular techniques for processing, compressing, and visualizing data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Therefore, previous attempts to formulate mixture models for PCA have been ad hoc to some extent. In this article, PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectation-maximization algorithm. We discuss the advantages of this model in the context of clustering, density modeling, and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  Calyampudi R. Rao Estimation and tests of significance in factor analysis , 1955 .

[4]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[5]  Jerzy Neyman,et al.  Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability : held at the Statistical Laboratory, University of California, December, 1954, July and August, 1955 , 1958 .

[6]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[9]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[10]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[11]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[12]  D. Broomhead,et al.  Local adaptive Galerkin bases for large-dimensional dynamical systems , 1991 .

[13]  R. Tibshirani Principal curves revisited , 1992 .

[14]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[15]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[16]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[18]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[19]  Geoffrey E. Hinton,et al.  Recognizing Handwritten Digits Using Mixtures of Linear Models , 1994, NIPS.

[20]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[21]  Simon Haykin,et al.  Optimally adaptive transform coding , 1995, IEEE Trans. Image Process..

[22]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[23]  Stephen Jose Hanson,et al.  A Neural Network Autoassociator for Induction Motor Failure Prediction , 1995, NIPS.

[24]  N. Kambhatla Local models and Gaussian mixture models for statistical data processing , 1996 .

[25]  Andrew R. Webb An approach to non-linear principal components analysis using radially symmetric kernel functions , 1996, Stat. Comput..

[26]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[27]  Michael E. Tipping,et al.  Mixtures of Principal Component Analysers , 1997 .

[28]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[29]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[30]  R. Shanmugam Multivariate Analysis: Part 2: Classification, Covariance Structures and Repeated Measurements , 1998 .

[31]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  V. Nikulin A Bayesian approach to object detection , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[33]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[34]  Dirk Husmeier,et al.  The Bayesian Evidence Scheme for Regularizing Probability-Density Estimating Neural Networks , 2000, Neural Computation.

[35]  Miguel Á. Carreira-Perpiñán,et al.  Mode-Finding for Mixtures of Gaussian Distributions , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Sun-Yuan Kung,et al.  Probabilistic principal component subspaces: a hierarchical finite mixture model for data visualization , 2000, IEEE Trans. Neural Networks Learn. Syst..

[37]  Richard M. Everson,et al.  Inferring the eigenvalues of covariance matrices from limited, noisy data , 2000, IEEE Trans. Signal Process..

[38]  A. Utsugi,et al.  Bayesian Analysis of Mixtures of Factor Analyzers , 2001, Neural Computation.

[39]  W. DeSarbo,et al.  The Spatial Representation of Market Information , 2001 .

[40]  Dong Kook Kim,et al.  Rapid speaker adaptation using probabilistic principal component analysis , 2001, IEEE Signal Processing Letters.

[41]  Helge J. Ritter,et al.  Resolution-Based Complexity Control for Gaussian Mixture Models , 2001, Neural Computation.

[42]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[43]  A. Torokhti,et al.  Optimal fixed rank transform of the second degree , 2001 .