Spectral Regularization Algorithms for Learning Large Incomplete Matrices

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices: for example it can obtain a rank-80 approximation of a 10(6) × 10(6) incomplete matrix with 10(5) observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive state-of-the art techniques.

[1]  Shiqian Ma,et al.  Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..

[2]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[3]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[4]  Cun-Hui Zhang PENALIZED LINEAR UNBIASED SELECTION , 2007 .

[5]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation , 2010 .

[7]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[8]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[9]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[10]  Trevor Hastie,et al.  Imputing Missing Data for Gene Expression Arrays , 2001 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[13]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[14]  H. Wold Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach , 1975, Journal of Applied Probability.

[15]  J. Friedman Fast sparse regression and classification , 2012 .

[16]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[18]  Francis R. Bach,et al.  Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[19]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[20]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[21]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[22]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[23]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[24]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[27]  Renato D. C. Monteiro,et al.  Local Minima and Convergence in Low-Rank Semidefinite Programming , 2005, Math. Program..

[28]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[29]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[30]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[33]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[34]  Lieven Vandenberghe,et al.  Interior-Point Method for Nuclear Norm Approximation with Application to System Identification , 2009, SIAM J. Matrix Anal. Appl..

[35]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[36]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[37]  Dennis DeCoste,et al.  Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[38]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[39]  Noga Alon,et al.  Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices , 2004, NIPS.

[40]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[41]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[42]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[43]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[44]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[45]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .