Classification with Low Rank and Missing Data

We consider classification and regression tasks where we have missing data and assume that the (clean) data resides in a low rank subspace. Finding a hidden subspace is known to be computationally hard. Nevertheless, using a nonproper formulation we give an efficient agnostic algorithm that classifies as good as the best linear classifier coupled with the best low-dimensional subspace in which the data resides. A direct implication is that our algorithm can linearly (and non-linearly through kernels) classify provably as well as the best classifier that has access to the full data.

[1]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[2]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[3]  Shai Ben-David,et al.  Learning with restricted focus of attention , 1993, COLT '93.

[4]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[5]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[6]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[7]  Nathan Srebro,et al.  Learning with matrix factorizations , 2004 .

[8]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[9]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[10]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[11]  Pieter Abbeel,et al.  Max-margin Classification of Data with Absent Features , 2008, J. Mach. Learn. Res..

[12]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[13]  Ohad Shamir,et al.  Learning to classify with missing and corrupted features , 2008, ICML '08.

[14]  Ruslan Salakhutdinov,et al.  Practical Large-Scale Optimization for Max-norm Regularization , 2010, NIPS.

[15]  Ohad Shamir,et al.  Efficient Learning with Partially Observed Attributes , 2010, ICML.

[16]  Robert D. Nowak,et al.  Transduction with Matrix Completion: Three Birds with One Stone , 2010, NIPS.

[17]  Ruslan Salakhutdinov,et al.  Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm , 2010, NIPS.

[18]  Nicolò Cesa-Bianchi,et al.  Online Learning of Noisy Data , 2011, IEEE Transactions on Information Theory.

[19]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[20]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[21]  Ohad Shamir,et al.  Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing , 2011, COLT.

[22]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[23]  Shai Shalev-Shwartz,et al.  Near-Optimal Algorithms for Online Matrix Prediction , 2012, COLT.

[24]  Elad Hazan,et al.  Linear Regression with Limited Observation , 2012, ICML.

[25]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[26]  Elad Eban,et al.  Discrete Chebyshev Classifiers , 2014, ICML.

[27]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..