EP-GIG Priors and Applications in Bayesian Sparse Learning

In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EP-GIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweighted l2 or l1 methods. Finally, we present two extensions for grouped variable selection and logistic regression.

[1]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[2]  J. Griffin,et al.  Bayesian adaptive lassos with non-convex penalization , 2007 .

[3]  Francis R. Bach,et al.  Sparse probabilistic projections , 2008, NIPS.

[4]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  David P. Wipf,et al.  Iterative Reweighted 1 and 2 Methods for Finding Sparse Solutions , 2010, IEEE J. Sel. Top. Signal Process..

[9]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[10]  Peter Secretan Learning , 1965, Mental Health.

[11]  B. Jørgensen Statistical Properties of the Generalized Inverse Gaussian Distribution , 1981 .

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  Chris Hans Bayesian lasso regression , 2009 .

[14]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[16]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[17]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[18]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[19]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[20]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[21]  A. Doucet,et al.  A Hierarchical Bayesian Framework for Constructing Sparsity-inducing Priors , 2010, 1009.1914.

[22]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[23]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[24]  James G. Scott,et al.  Sparse Bayes estimation in non-Gaussian models via data augmentation , 2011 .

[25]  M. West On scale mixtures of normal distributions , 1987 .

[26]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[27]  Arnaud Doucet,et al.  Sparse Bayesian nonparametric regression , 2008, ICML '08.

[28]  K. Lange,et al.  Normal/Independent Distributions and Their Applications in Robust Regression , 1993 .

[29]  Harri T. Kiiveri,et al.  A general approach to simultaneous model fitting and variable elimination in response models for biological data with many more variables than observations , 2008, BMC Bioinformatics.

[30]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[31]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[32]  C. Braak Discussion to 'Predicting multivariate responses in multiple linear regression' by L. Breiman & J.H. Friedman , 1997 .

[33]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[34]  Bruno A. Olshausen,et al.  Group Sparse Coding with a Laplacian Scale Mixture Prior , 2010, NIPS.

[35]  James G. Scott,et al.  Local shrinkage rules, Lévy processes and regularized regression , 2010, 1010.3390.

[36]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[38]  Frank E. Grubbs,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[39]  Volkan Cevher,et al.  Learning with Compressible Priors , 2009, NIPS.

[40]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[41]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[42]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[43]  Emil Grosswald,et al.  The student t-distribution of any degree of freedom is infinitely divisible , 1976 .