A LASSO FOR HIERARCHICAL INTERACTIONS.

We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between parameter sparsity-the number of nonzero coefficients-and practical sparsity-the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.

[1]  J. Nelder A Reformulation of Linear Models , 1977 .

[2]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[3]  Silvia Lanteri,et al.  Classification of olive oils from their fatty acid composition , 1983 .

[4]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[5]  J. Peixoto Hierarchical Variable Selection in Polynomial Regression Models , 1987 .

[6]  Wallace W. Tourtellotte,et al.  Interaction , 1988 .

[7]  Changbao Wu,et al.  Analysis of Designed Experiments with Complex Aliasing , 1992 .

[8]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[9]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[10]  Hugh Chipman,et al.  Bayesian variable selection with related predictors , 1995, bayes-an/9510001.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[13]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[14]  J. Chimka Categorical Data Analysis, Second Edition , 2003 .

[15]  D. Madigan Discussion of Least Angle Regression , 2003 .

[16]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[17]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  R. Shafer,et al.  Genotypic predictors of human immunodeficiency virus type 1 drug resistance , 2006, Proceedings of the National Academy of Sciences.

[20]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[21]  Yi Lin,et al.  An Efficient Variable Selection Approach for Analyzing Designed Experiments , 2007, Technometrics.

[22]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[23]  A. Rinaldo,et al.  The Log-Linear Group Lasso Estimator and Its Asymptotic Properties , 2007, 0709.3526.

[24]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[25]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[26]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[27]  H. Zou,et al.  Structured variable selection and estimation , 2009, 1011.0610.

[28]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[29]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[30]  Ji Zhu,et al.  Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[31]  Peter J. Bickel,et al.  Hierarchical selection of variables in sparse high-dimensional regression , 2008, 0801.1158.

[32]  K. Roeder,et al.  Screen and clean: a tool for identifying interactions in genome‐wide association studies , 2010, Genetic epidemiology.

[33]  Gareth M. James,et al.  Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions , 2010 .

[34]  Jean-Philippe Vert,et al.  Group Lasso with Overlaps: the Latent Group Lasso approach , 2011, ArXiv.

[35]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[36]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[37]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[38]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[39]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[40]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[41]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[42]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.