Adaptive Multi-Task Lasso: with Application to eQTL Detection

To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively. Then we find the maximum a posteriori (MAP) estimation of regression coefficients and estimate weights of covariates jointly. This optimization procedure is efficient since it can be achieved by using a projected gradient descent and a coordinate descent procedure iteratively. Experimental results on simulated and real yeast datasets confirm that our model outperforms previous methods for finding eQTLs.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  M. A. Gómez–Villegas,et al.  A MATRIX VARIATE GENERALIZATION OF THE POWER EXPONENTIAL FAMILY OF DISTRIBUTIONS , 2002 .

[3]  G. Storz An Expanding Universe of Noncoding RNAs , 2002, Science.

[4]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[5]  T. Miyake,et al.  Genome-wide Analysis of ARS (Autonomously Replicating Sequence) Binding Factor 1 (Abf1p)-mediated Transcriptional Regulation in Saccharomyces cerevisiae* , 2004, Journal of Biological Chemistry.

[6]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[8]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[9]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[10]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[11]  Pierre Morizet-Mahoudeaux,et al.  Hierarchical Penalization , 2007, NIPS.

[12]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[13]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[14]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[15]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[16]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[17]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[18]  David A. Drubin,et al.  Learning a Prior on Regulatory Potential from eQTL Data , 2009, PLoS genetics.

[19]  Mark W. Schmidt,et al.  Group Sparse Priors for Covariance Estimation , 2009, UAI.

[20]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.