Sparsity and smoothness via the fused lasso

Summary.  The lasso penalizes a least squares regression by the sum of the absolute values (L1‐norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the ‘fused lasso’, a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the L1‐norm of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also sparsity of their differences—i.e. local constancy of the coefficient profile. The fused lasso is especially useful when the number of features p is much greater than N, the sample size. The technique is also extended to the ‘hinge’ loss function that underlies the support vector classifier. We illustrate the methods on examples from protein mass spectroscopy and gene expression data.

[1]  H. Wold Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach , 1975, Journal of Applied Probability.

[2]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[7]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[10]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Robert Tibshirani,et al.  The Elements of Statistical Learning , 2001 .

[13]  E. Petricoin,et al.  MECHANISMS OF DISEASE Mechanisms of disease Use of proteomic patterns in serum to identify ovarian cancer , 2022 .

[14]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[15]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[17]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[18]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[19]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[20]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2003 .

[21]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..