Towards Structural Sparsity: An Explicit l2/l0 Approach

In many cases of machine learning or data mining applications, we are not only aimed to establish accurate {\em black box} predictors, we are also interested in discovering predictive patterns in data which enhance our interpretation and understanding of underlying physical, biological and other natural processes. Sparse representation is one of the focuses in this direction. More recently, structural sparsity has attracted increasing attentions. The structural sparsity is often achieved by imposing l2/l1 norms. In this paper, we present the explicit l2/l0 norm to directly achieve structural sparsity. To tackle the problem of intractable l2/l0 optimization, we develop a general Lipschitz auxiliary function which leads to simple iterative algorithms. In each iteration, optimal solution is achieved for the induced sub-problem and a guarantee of convergence is provided. Further more, the local convergent rate is also theoretically bounded. We test our optimization techniques in the multi-task feature learning problem. Experimental results suggest that our approaches outperform other approaches in both synthetic and real world data sets.

[1]  Emmanuel J. Candès,et al.  Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions , 2004, Found. Comput. Math..

[2]  Ji Zhu,et al.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. , 2008, The annals of applied statistics.

[3]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[4]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Jing Li,et al.  Mining brain region connectivity for alzheimer's disease study via sparse inverse covariance estimation , 2009, KDD.

[7]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[8]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[10]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[11]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[14]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[15]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[16]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[17]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[18]  Jieping Ye,et al.  Efficient Recovery of Jointly Sparse Vectors , 2009, NIPS.

[19]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[20]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[21]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[22]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[23]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[24]  Hyunsoo Kim,et al.  Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis , 2007, Bioinform..

[25]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[26]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[27]  Marco Righero,et al.  An introduction to compressive sensing , 2009 .

[28]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[29]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[30]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[31]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[32]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[33]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[34]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[35]  Jing Li,et al.  Learning brain connectivity of Alzheimer's disease by sparse inverse covariance estimation , 2010, NeuroImage.

[36]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[37]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[38]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[39]  Paola Sebastiani,et al.  Imputation of missing genotypes: an empirical evaluation of IMPUTE , 2008, BMC Genetics.

[40]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[41]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[42]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[43]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[44]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[45]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[46]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[47]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..