Robust Non-Negative Dictionary Learning

Dictionary learning plays an important role in machine learning, where data vectors are modeled as a sparse linear combinations of basis factors (i.e., dictionary). However, how to conduct dictionary learning in noisy environment has not been well studied. Moreover, in practice, the dictionary (i.e., the lower rank approximation of the data matrix) and the sparse representations are required to be nonnegative, such as applications for image annotation, document summarization, microarray analysis. In this paper, we propose a new formulation for non-negative dictionary learning in noisy environment, where structure sparsity is enforced on sparse representation. The proposed new formulation is also robust for data with noises and outliers, due to a robust loss function used. We derive an efficient multiplicative updating algorithm to solve the optimization problem, where dictionary and sparse representation are updated iteratively. We prove the convergence and correctness of proposed algorithm rigorously. We show the differences of dictionary at different level of sparsity constraint. The proposed algorithm can be adapted for clustering and semi-supervised learning.

[1]  Vikas Sindhwani,et al.  Emerging topic detection using dictionary learning , 2011, CIKM '11.

[2]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[3]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[4]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[5]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[6]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[7]  Yi Pan,et al.  Sparse nonnegative matrix factorization for protein sequence motif discovery , 2011, Expert Syst. Appl..

[8]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[9]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  SapiroGuillermo,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2010 .

[12]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[13]  Chris H. Q. Ding,et al.  Nonnegative matrix factorization using a robust error function , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  S. Mallat A wavelet tour of signal processing , 1998 .

[15]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Renato D. C. Monteiro,et al.  Group Sparsity in Nonnegative Matrix Factorization , 2012, SDM.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[19]  Matthias Hein,et al.  Spectral clustering based on the graph p-Laplacian , 2009, ICML '09.

[20]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[21]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[22]  Lei Zhang,et al.  Sparsity-based image denoising via dictionary learning and structural clustering , 2011, CVPR 2011.

[23]  Chris H. Q. Ding,et al.  Robust Tucker Tensor Decomposition for Effective Image Representation , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Feng Liu,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries in Wavelet Domain , 2009, 2009 Fifth International Conference on Image and Graphics.

[25]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[26]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Feiping Nie,et al.  An Iterative Locally Linear Embedding Algorithm , 2012, ICML.

[28]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[29]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing, 2nd Edition , 1999 .

[30]  Fei Wang,et al.  Learning a Bi-Stochastic Data Similarity Matrix , 2010, 2010 IEEE International Conference on Data Mining.

[31]  Michael Elad,et al.  Image Sequence Denoising via Sparse and Redundant Representations , 2009, IEEE Transactions on Image Processing.

[32]  Chris H. Q. Ding,et al.  Robust nonnegative matrix factorization using L21-norm , 2011, CIKM '11.

[33]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[34]  Chris H. Q. Ding,et al.  Maximum Consistency Preferential Random Walks , 2012, ECML/PKDD.

[35]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[36]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[37]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[38]  Chris H. Q. Ding,et al.  A learning framework using Green's function and kernel regularization with application to recommender system , 2007, KDD '07.