Gene-Gene Interactions Detection Using a Two-Stage Model

Genome wide association studies GWAS have discovered numerous loci involved in genetic traits. Virtually all studies have reported associations between individual single nucleotide polymorphism SNP and traits. However, it is likely that complex traits are influenced by interaction of multiple SNPs. One approach to detect interactions of SNPs is the brute force approach which performs a pairwise association test between a trait and each pair of SNPs. The brute force approach is often computationally infeasible because of the large number of SNPs collected in current GWAS studies. We propose a two-stage model, Threshold-based Efficient Pairwise Association Approach TEPAA, to reduce the number of tests needed while maintaining almost identical power to the brute force approach. In the first stage, our method performs the single marker test on all SNPs and selects a subset of SNPs that achieve a certain significance threshold. In the second stage, we perform a pairwise association test between traits and pairs of the SNPs selected from the first stage. The key insight of our approach is that we derive the joint distribution between the association statistics of a single SNP and the association statistics of pairs of SNPs. This joint distribution allows us to provide guarantees that the statistical power of our approach will closely approximate the brute force approach. We applied our approach to the Northern Finland Birth Cohort data and achieved 63 times speedup while maintaining 99% of the power of the brute force approach.

[1]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[2]  Eleazar Eskin,et al.  Efficiently Identifying Significant Associations in Genome-Wide Association Studies , 2013, RECOMB.

[3]  David V Conti,et al.  A testing framework for identifying susceptibility genes in the presence of epistasis. , 2006, American journal of human genetics.

[4]  Vineet Bafna,et al.  RAPID detection of gene-gene interactions in genome-wide association studies , 2010, Bioinform..

[5]  P. Rantakallio,et al.  Groups at risk in low birth weight infants and perinatal mortality. , 1969, Acta paediatrica Scandinavica.

[6]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[7]  Ioannis Xenarios,et al.  FastEpistasis: a high performance computing solution for quantitative trait epistasis , 2010, Bioinform..

[8]  E. D. Yanchina,et al.  Gene-Gene Interactions between Glutathione-S Transferase M1 and Matrix Metalloproteinase 9 in the Formation of Hereditary Predisposition to Chronic Obstructive Pulmonary Disease , 2004, Bulletin of Experimental Biology and Medicine.

[9]  I. Pe’er,et al.  Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease , 2012, Genome research.

[10]  John D. Storey,et al.  Genetic interactions between polymorphisms that affect gene expression in yeast , 2005, Nature.

[11]  Pieter H. Reitsma,et al.  Mutation in blood coagulation factor V associated with resistance to activated protein C , 1994, Nature.

[12]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[13]  David Heckerman,et al.  A powerful and efficient set test for genetic markers that handles confounders , 2012, Bioinform..

[14]  R. Felder,et al.  Combinations of variations in multiple genes are associated with hypertension. , 2000, Hypertension.

[15]  Xiang Zhang,et al.  TEAM: efficient two-locus epistasis tests in human genome-wide association study , 2010, Bioinform..

[16]  Qiang Yang,et al.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies , 2009, Bioinform..

[17]  David M. Evans,et al.  Two-Stage Two-Locus Models in Genome-Wide Association , 2006, PLoS genetics.

[18]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[19]  Eleazar Eskin,et al.  Efficiently Identifying Significant Associations in Genome-Wide Association Studies , 2013, RECOMB.

[20]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[21]  Mark I. McCarthy,et al.  Early Life Factors and Blood Pressure at Age 31 Years in the 1966 Northern Finland Birth Cohort , 2004, Hypertension.

[22]  Xiang Zhang,et al.  COE: A General Approach for Efficient Genome-Wide Two-Locus Epistasis Test in Disease Association Study , 2009, RECOMB.

[23]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[24]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[25]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[26]  Eleazar Eskin,et al.  Increasing Power of Genome-Wide Association Studies by Collecting Additional Single-Nucleotide Polymorphisms , 2011, Genetics.

[27]  Eric S. Lander,et al.  The common PPARγ Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes , 2000, Nature Genetics.

[28]  Sverker Holmgren,et al.  Simultaneous search for multiple QTL using the global optimization algorithm DIRECT , 2004, Bioinform..

[29]  J. Haines,et al.  Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. , 1993, Science.