A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relate the model to Laplacian eigenmaps and isomap. We show that parameter fitting in the locally linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that MEU and ALLE are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.

[1]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[2]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[3]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[4]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[5]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[6]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[7]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[8]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[9]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Stefan Harmeling Exploring model selection techniques for nonlinear dimensionality reduction , 2007 .

[12]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[13]  K I WilliamsChristopher On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2002 .

[14]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[15]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[16]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[17]  Derek J. Pike,et al.  Empirical Model‐building and Response Surfaces. , 1988 .

[18]  Zoubin Ghahramani,et al.  Semi-supervised learning : from Gaussian fields to Gaussian processes , 2003 .

[19]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[20]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[21]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[22]  Larry Wasserman,et al.  All of Statistics , 2004 .

[23]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[24]  Nicolas Le Roux,et al.  Spectral Dimensionality Reduction , 2006, Feature Extraction.

[25]  E. T. Jaynes,et al.  BAYESIAN METHODS: GENERAL BACKGROUND ? An Introductory Tutorial , 1986 .

[26]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[27]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[28]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[29]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[30]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[31]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[32]  Neil D. Lawrence,et al.  WiFi-SLAM Using Gaussian Process Latent Variable Models , 2007, IJCAI.