Gene-Gene Interactions Detection Using a Two-stage Model

Genome-wide association studies (GWAS) have discovered numerous loci involved in genetic traits. Virtually all studies have reported associations between individual single nucleotide polymorphisms (SNPs) and traits. However, it is likely that complex traits are influenced by interaction of multiple SNPs. One approach to detect interactions of SNPs is the brute force approach which performs a pairwise association test between a trait and each pair of SNPs. The brute force approach is often computationally infeasible because of the large number of SNPs collected in current GWAS studies. We propose a two-stage model, Threshold-based Efficient Pairwise Association Approach (TEPAA), to reduce the number of tests needed while maintaining almost identical power to the brute force approach. In the first stage, our method performs the single marker test on all SNPs and selects a subset of SNPs that achieve a certain significance threshold. In the second stage, we perform a pairwise association test between traits and pairs of the SNPs selected from the first stage. The key insight of our approach is that we derive the joint distribution between the association statistics of a single SNP and the association statistics of pairs of SNPs. This joint distribution allows us to provide guarantees that the statistical power of our approach will closely approximate the brute force approach. We applied our approach to the Northern Finland Birth Cohort data and achieved 63 times speedup while maintaining 99% of the power of the brute force approach.

[1]  David M. Evans,et al.  Two-Stage Two-Locus Models in Genome-Wide Association , 2006, PLoS genetics.

[2]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[3]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[4]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[5]  Eleazar Eskin,et al.  Efficiently Identifying Significant Associations in Genome-Wide Association Studies , 2013, RECOMB.

[6]  Vineet Bafna,et al.  RAPID detection of gene-gene interactions in genome-wide association studies , 2010, Bioinform..

[7]  J. Haines,et al.  Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. , 1993, Science.

[8]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[9]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[10]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[11]  R. Felder,et al.  Combinations of variations in multiple genes are associated with hypertension. , 2000, Hypertension.

[12]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[13]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[14]  Mark I. McCarthy,et al.  Early Life Factors and Blood Pressure at Age 31 Years in the 1966 Northern Finland Birth Cohort , 2004, Hypertension.

[15]  Ioannis Xenarios,et al.  FastEpistasis: a high performance computing solution for quantitative trait epistasis , 2010, Bioinform..

[16]  Eleazar Eskin,et al.  Increasing Power of Genome-Wide Association Studies by Collecting Additional Single-Nucleotide Polymorphisms , 2011, Genetics.

[17]  Sverker Holmgren,et al.  Simultaneous search for multiple QTL using the global optimization algorithm DIRECT , 2004, Bioinform..

[18]  I. Pe’er,et al.  Ultrafast genome-wide scan for SNP–SNP interactions in common complex disease , 2012, Genome research.

[19]  Qiang Yang,et al.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies , 2009, Bioinform..

[20]  Pieter H. Reitsma,et al.  Mutation in blood coagulation factor V associated with resistance to activated protein C , 1994, Nature.

[21]  Xiang Zhang,et al.  TEAM: efficient two-locus epistasis tests in human genome-wide association study , 2010, Bioinform..

[22]  David V Conti,et al.  A testing framework for identifying susceptibility genes in the presence of epistasis. , 2006, American journal of human genetics.

[23]  John D. Storey,et al.  Genetic interactions between polymorphisms that affect gene expression in yeast , 2005, Nature.

[24]  E. D. Yanchina,et al.  Gene-Gene Interactions between Glutathione-S Transferase M1 and Matrix Metalloproteinase 9 in the Formation of Hereditary Predisposition to Chronic Obstructive Pulmonary Disease , 2004, Bulletin of Experimental Biology and Medicine.

[25]  David Heckerman,et al.  A powerful and efficient set test for genetic markers that handles confounders , 2012, Bioinform..

[26]  M. Daly,et al.  Guilt by association , 2000, Nature Genetics.