论文信息 - Spectral Regularization Algorithms for Learning Large Incomplete Matrices

Spectral Regularization Algorithms for Learning Large Incomplete Matrices

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices: for example it can obtain a rank-80 approximation of a 10(6) × 10(6) incomplete matrix with 10(5) observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive state-of-the art techniques.

[1] Shiqian Ma,et al. Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..

[2] Francis R. Bach,et al. A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[3] R. Tibshirani,et al. PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[4] Cun-Hui Zhang. PENALIZED LINEAR UNBIASED SELECTION , 2007 .

[5] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6] Emmanuel J. Candès,et al. The Power of Convex Relaxation , 2010 .

[7] T. Hastie,et al. SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[8] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[9] Pablo A. Parrilo,et al. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[10] Trevor Hastie,et al. Imputing Missing Data for Gene Expression Arrays , 2001 .

[11] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[12] Andrea Montanari,et al. Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[13] Shuiwang Ji,et al. SLEP: Sparse Learning with Efficient Projections , 2011 .

[14] H. Wold. Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach , 1975, Journal of Applied Probability.

[15] J. Friedman. Fast sparse regression and classification , 2012 .

[16] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17] Yehuda Koren,et al. Lessons from the Netflix prize challenge , 2007, SKDD.

[18] Francis R. Bach,et al. Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[19] Nathan Srebro,et al. Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[20] Domonkos Tikk,et al. Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[21] Patrick L. Combettes,et al. Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[22] Massimiliano Pontil,et al. Multi-Task Feature Learning , 2006, NIPS.

[23] Klaus Nordhausen,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[24] R. Larsen. Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[25] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26] Jieping Ye,et al. An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[27] Renato D. C. Monteiro,et al. Local Minima and Convergence in Low-Rank Semidefinite Programming , 2005, Math. Program..

[28] Emmanuel J. Candès,et al. A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[29] Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[30] Massimiliano Pontil,et al. Convex multi-task feature learning , 2008, Machine Learning.

[31] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[33] Donald B. Rubin,et al. Max-imum Likelihood from Incomplete Data , 1972 .

[34] Lieven Vandenberghe,et al. Interior-Point Method for Nuclear Norm Approximation with Application to System Identification , 2009, SIAM J. Matrix Anal. Appl..

[35] Peng Zhao,et al. On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[36] I. Johnstone,et al. Wavelet Shrinkage: Asymptopia? , 1995 .

[37] Dennis DeCoste,et al. Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[38] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[39] Noga Alon,et al. Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices , 2004, NIPS.

[40] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[41] Emmanuel J. Candès,et al. The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[42] Russ B. Altman,et al. Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[43] Tommi S. Jaakkola,et al. Maximum-Margin Matrix Factorization , 2004, NIPS.

[44] Tommi S. Jaakkola,et al. Weighted Low-Rank Approximations , 2003, ICML.

[45] Y. Nesterov. Gradient methods for minimizing composite objective function , 2007 .