Toward structural sparsity: an explicit $$\ell _{2}/\ell _0$$ approach

As powerful tools, machine learning and data mining techniques have been widely applied in various areas. However, in many real-world applications, besides establishing accurate black box predictors, we are also interested in white box mechanisms, such as discovering predictive patterns in data that enhance our understanding of underlying physical, biological and other natural processes. For these purposes, sparse representation and its variations have been one of the focuses. More recently, structural sparsity has attracted increasing attentions. In previous research, structural sparsity was often achieved by imposing convex but non-smooth norms such as $${\ell _{2}/\ell _{1}}$$ and group $${\ell _{2}/\ell _{1}}$$ norms. In this paper, we present the explicit $${\ell _2/\ell _0}$$ and group $${\ell _2/\ell _0}$$ norm to directly approach the structural sparsity. To tackle the problem of intractable $${\ell _2/\ell _0}$$ optimizations, we develop a general Lipschitz auxiliary function that leads to simple iterative algorithms. In each iteration, optimal solution is achieved for the induced subproblem and a guarantee of convergence is provided. Furthermore, the local convergent rate is also theoretically bounded. We test our optimization techniques in the multitask feature learning problem. Experimental results suggest that our approaches outperform other approaches in both synthetic and real-world data sets.

[1]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[2]  Jieping Ye,et al.  Efficient Recovery of Jointly Sparse Vectors , 2009, NIPS.

[3]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[4]  Driss Aboutajdine,et al.  A two-stage gene selection scheme utilizing MRMR filter and GA wrapper , 2011, Knowledge and Information Systems.

[5]  Michael I. Jordan,et al.  Computing regularization paths for learning multiple kernels , 2004, NIPS.

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[8]  J. Tropp JUST RELAX: CONVEX PROGRAMMING METHODS FOR SUBSET SELECTION AND SPARSE APPROXIMATION , 2004 .

[9]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[12]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[13]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[14]  Jieping Ye,et al.  Moreau-Yosida Regularization for Grouped Tree Structure Learning , 2010, NIPS.

[15]  Ji Zhu,et al.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. , 2008, The annals of applied statistics.

[16]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[17]  Xi Chen,et al.  An Efficient Proximal-Gradient Method for Single and Multi-task Regression with Structured Sparsity , 2010, ArXiv.

[18]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[19]  Kaizhu Huang,et al.  Generalized sparse metric learning with relative comparisons , 2011, Knowledge and Information Systems.

[20]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[21]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[22]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[23]  Emmanuel J. Candès,et al.  Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions , 2004, Found. Comput. Math..

[24]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[25]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[26]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[27]  Marco Righero,et al.  An introduction to compressive sensing , 2009 .

[28]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[29]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jing Li,et al.  Learning brain connectivity of Alzheimer's disease by sparse inverse covariance estimation , 2010, NeuroImage.

[31]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[32]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[33]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[34]  Elena Baralis,et al.  Measuring gene similarity by means of the classification distance , 2011, Knowledge and Information Systems.

[35]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[36]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[37]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[38]  Thomas Villmann,et al.  Evolving trees for the retrieval of mass spectrometry-based bacteria fingerprints , 2010, Knowledge and Information Systems.

[39]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[40]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[41]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[42]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[43]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[44]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[45]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[46]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[47]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[48]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[49]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[50]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[51]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[52]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[54]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[55]  M. Stojnic,et al.  $\ell_{2}/\ell_{1}$ -Optimization in Block-Sparse Compressed Sensing and Its Strong Thresholds , 2010, IEEE Journal of Selected Topics in Signal Processing.

[56]  Jing Li,et al.  Mining brain region connectivity for alzheimer's disease study via sparse inverse covariance estimation , 2009, KDD.

[57]  Paola Sebastiani,et al.  Imputation of missing genotypes: an empirical evaluation of IMPUTE , 2008, BMC Genetics.

[58]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[59]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[60]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .