A Permutation Approach to Testing Interactions for Binary Response by Comparing Correlations Between Classes

To date testing interactions in high dimensions is a challenging task. Existing methods often have issues with sensitivity to modeling assumptions and heavily asymptotic nominal p-values. To help alleviate these issues, we propose a permutation-based method for testing marginal interactions with a binary response. Our method searches for pairwise correlations that differ between classes. In this article, we compare our method on real and simulated data to the standard approach of running many pairwise logistic models. On simulated data our method finds more significant interactions at a lower false discovery rate (especially in the presence of main effects). On real genomic data, although there is no gold standard, our method finds apparent signal and tells a believable story, while logistic regression does not. We also give asymptotic consistency results under not too restrictive assumptions. Supplementary materials for this article are available online.

[1]  Ali Shojaie,et al.  Inference in High Dimensions with the Penalized Score Test , 2014, 1401.2678.

[2]  Harald Binder,et al.  Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures , 2014, PloS one.

[3]  Huey-miin Hsueh,et al.  Identifying the differentially expressed genes with RNA-Seq data , 2013 .

[4]  L. AuerPaul,et al.  A Two-Stage Poisson Model for Testing RNA-Seq Data , 2011 .

[5]  Kenneth Rice,et al.  Permutation and Parametric Bootstrap Tests for Gene–Gene and Gene–Environment Interactions , 2011, Annals of human genetics.

[6]  Bradley Efron,et al.  Large-scale inference , 2010 .

[7]  Yingtao Bi,et al.  The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise , 2010, J. Multivar. Anal..

[8]  M. LeBlanc,et al.  Increasing the power of identifying gene × gene interactions in genome‐wide association studies , 2008, Genetic epidemiology.

[9]  M. Cotreau,et al.  Molecular classification of Crohn's disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells. , 2006, The Journal of molecular diagnostics : JMD.

[10]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[12]  F. Pesarin Multivariate Permutation Tests : With Applications in Biostatistics , 2001 .

[13]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .