The Multivariate Normal Distribution Framework for Analyzing Association Studies

Genome-wide association studies (GWAS) have discovered thousands of variants involved in common human diseases. In these studies, frequencies of genetic variants are compared between a cohort of individuals with a disease (cases) and a cohort of healthy individuals (controls). Any variant that has a significantly different frequency between the two cohorts is considered an associated variant. A challenge in the analysis of GWAS studies is the fact that human population history causes nearby genetic variants in the genome to be correlated with each other. In this review, we demonstrate how to utilize the multivariate normal (MVN) distribution to explicitly take into account the correlation between genetic variants in a comprehensive framework for analysis of GWAS. We show how the MVN framework can be applied to perform association testing, correct for multiple hypothesis testing, estimate statistical power, and perform fine mapping and imputation.

[1]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[2]  G. Tsujimoto,et al.  Free fatty acids regulate gut incretin glucagon-like peptide-1 secretion through GPR120 , 2005, Nature Medicine.

[3]  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014, bioRxiv.

[4]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[5]  S. P. Fodor,et al.  Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays , 2004, Nature Methods.

[6]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[7]  Francis S Collins,et al.  A HapMap harvest of insights into the genetics of common disease. , 2008, The Journal of clinical investigation.

[8]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[9]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[10]  E. Eskin,et al.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies , 2014, PLoS genetics.

[11]  K. Gunderson,et al.  A genome-wide scalable SNP genotyping assay using microarray technology , 2005, Nature Genetics.

[12]  Eleazar Eskin,et al.  Identification of causal genes for complex traits , 2015, Bioinform..

[13]  Eleazar Eskin,et al.  Incorporating prior information into association studies , 2012, Bioinform..

[14]  Donghyung Lee,et al.  DIST: direct imputation of summary statistics for unmeasured SNPs , 2013, Bioinform..

[15]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[16]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[17]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[18]  Eleazar Eskin,et al.  Genome-wide association studies in mice , 2012, Nature Reviews Genetics.

[19]  Manolis Kellis,et al.  Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases , 2016, Nucleic acids research.

[20]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[21]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[22]  N J Cox,et al.  The importance of genealogy in determining genetic associations with complex traits. , 2001, American journal of human genetics.

[23]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[24]  B. Pasaniuc,et al.  Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. , 2015, American journal of human genetics.

[25]  M. Boehnke,et al.  So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. , 2007, American journal of human genetics.

[26]  Eleazar Eskin,et al.  Using genomic annotations increases statistical power to detect eGenes , 2016, Bioinform..

[27]  Gaurav Bhatia,et al.  Fast and accurate imputation of summary statistics enhances evidence of functional enrichment , 2013, Bioinform..

[28]  Eleazar Eskin,et al.  Efficiently Identifying Significant Associations in Genome-Wide Association Studies , 2013, RECOMB.

[29]  Jong Wha J. Joo,et al.  Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure , 2016, Genetics.

[30]  Eleazar Eskin,et al.  Increasing Power in Association Studies by Using Linkage Disequilibrium Structure and Molecular Function as Prior Information , 2008, RECOMB.

[31]  Ayellet V. Segrè,et al.  Colocalization of GWAS and eQTL Signals Detects Target Genes , 2016, bioRxiv.

[32]  Matti Pirinen,et al.  FINEMAP: efficient variable selection using summary data from genome-wide association studies , 2015, bioRxiv.

[33]  Eleazar Eskin,et al.  Identifying Causal Variants at Loci with Multiple Signals of Association , 2014, Genetics.

[34]  Amanda B. Hepler,et al.  Genetic relatedness analysis: modern data and new challenges , 2006, Nature Reviews Genetics.

[35]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[36]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[37]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[38]  Eleazar Eskin,et al.  Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers , 2009, PLoS genetics.

[39]  Eleazar Eskin,et al.  Widespread allelic heterogeneity in complex traits , 2016, bioRxiv.

[40]  Jong Wha J. Joo,et al.  Improving Imputation Accuracy by Inferring Causal Variants in Genetic Studies , 2019, RECOMB.

[41]  Eleazar Eskin,et al.  Imputing Phenotypes for Genome-wide Association Studies. , 2016, American journal of human genetics.

[42]  Birgir Hrafnkelsson,et al.  An Icelandic example of the impact of population structure on association studies , 2005, Nature Genetics.

[43]  Gregory A. Poland,et al.  Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics , 2015, Genetics.

[44]  Jong Wha J. Joo,et al.  Multiple testing correction in linear mixed models , 2016, Genome Biology.

[45]  P. Visscher,et al.  Advantages and pitfalls in the application of mixed-model association methods , 2014, Nature Genetics.