The Bigraphical Lasso

The i.i.d. assumption in machine learning is endemic, but often flawed. Complex data sets exhibit partial correlations between both instances and features. A model specifying both types of correlation can have a number of parameters that scales quadratically with the number of features and data points. We introduce the bigraphical lasso, an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. A prominent product in spectral graph theory, this structure has appealing properties for regression, enhanced sparsity and interpretability. To deal with the parameter explosion we introduce l1 penalties and fit the model through a flip-flop algorithm that results in a linear number of lasso regressions. We demonstrate the performance of our approach with simulations and an example from the COIL image data set.

[1]  Chenlei Leng,et al.  Sparse Matrix Graphical Models , 2012 .

[2]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[3]  Neil D. Lawrence,et al.  Efficient inference in matrix-variate Gaussian models with \iid observation noise , 2011, NIPS.

[4]  Gert Sabidussi,et al.  Graph multiplication , 1959 .

[5]  Neil D. Lawrence,et al.  A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models , 2010, J. Mach. Learn. Res..

[6]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[7]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[8]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[9]  Multivariate Geostatistics , 2004 .

[10]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[11]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[12]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[13]  James K. Binkley,et al.  A note on the efficiency of seemingly unrelated regression , 1988 .

[14]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[15]  A. O. Hagan A Markov Property for Covariance Structures , 2006 .

[16]  A. Dawid Some matrix-variate distribution theory: Notational considerations and a Bayesian application , 1981 .

[17]  Wilfried Imrich,et al.  Topics in Graph Theory: Graphs and Their Cartesian Product , 2008 .

[18]  Alfred O. Hero,et al.  On Convergence of Kronecker Graphical Lasso Algorithms , 2012, IEEE Transactions on Signal Processing.

[19]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[20]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[21]  P. Dutilleul The mle algorithm for the matrix normal distribution , 1999 .