Polygenic risk modeling with latent trait-related genetic components

Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While models like polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk based on components from Decomposition of Genetic Associations (DeGAs), which we call the DeGAs polygenic risk score (dPRS). We compute DeGAs using genetic associations for 977 traits in the UK Biobank and find that dPRS performs comparably to standard PRS while offering greater interpretability. We show how to decompose an individual’s genetic risk for a trait across DeGAs components, highlighting specific results for body mass index (BMI), myocardial infarction (heart attack), and gout in 337,151 white British individuals, with replication in a further set of 25,486 non-British white individuals from the Biobank. We find that BMI polygenic risk factorizes into components relating to fat-free mass, fat mass, and overall health indicators like physical activity measures. Most individuals with high dPRS for BMI have strong contributions from both a fat mass component and a fat-free mass component, whereas a few ‘outlier’ individuals have strong contributions from only one of the two components. Overall, our method enables fine-scale interpretation of the drivers of genetic risk for complex traits.

[1]  Trevor Hastie,et al.  A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank , 2019, bioRxiv.

[2]  William A. Richardson,et al.  SLC2A9 is a newly identified urate transporter influencing serum urate concentration, urate excretion and gout , 2008, Nature Genetics.

[3]  M. Rivas,et al.  Phenome-wide Burden of Copy Number Variation in the UK Biobank. , 2019, American journal of human genetics.

[4]  M. Rivas,et al.  Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study , 2018, Nature Communications.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  W. Willett,et al.  Alcohol intake and risk of incident gout in men: a prospective study , 2004, The Lancet.

[7]  Alkes L. Price,et al.  Multi-ethnic polygenic risk scores improve risk prediction in diverse populations , 2016, bioRxiv.

[8]  M. McCarthy,et al.  Painting a new picture of personalised medicine for diabetes , 2017, Diabetologia.

[9]  M. Paquette,et al.  SLC22A3 is associated with lipoprotein (a) concentration and cardiovascular disease in familial hypercholesterolemia. , 2019, Clinical biochemistry.

[10]  Kathryn S. Burch,et al.  Leveraging polygenic functional enrichment to improve GWAS power , 2017, bioRxiv.

[11]  M. Preece,et al.  Silver-Russell syndrome and ring chromosome 7 , 2000, Journal of medical genetics.

[12]  Hyon K. Choi,et al.  Obesity, weight change, hypertension, diuretic use, and risk of gout in men: the health professionals follow-up study. , 2005, Archives of internal medicine.

[13]  Yurii S. Aulchenko,et al.  Multiple loci associated with indices of renal function and chronic kidney disease , 2009, Nature Genetics.

[14]  A. Skol,et al.  Genome-wide associations reveal human-mouse genetic convergence and modifiers of myogenesis, CPNE1 and STC2 , 2018, bioRxiv.

[15]  M. Perola,et al.  Search for Early Pancreatic Cancer Blood Biomarkers in Five European Prospective Population Biobanks Using Metabolomics , 2019, bioRxiv.

[16]  E. H. Goulding,et al.  Brain-derived neurotrophic factor regulates energy balance downstream of melanocortin-4 receptor , 2003, Nature Neuroscience.

[17]  David M. Evans,et al.  Edinburgh Research Explorer Genome-wide association analysis identifies 20 loci that influence adult height , 2022 .

[18]  Christopher M. DeBoever,et al.  Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics , 2018, bioRxiv.

[19]  Independent effects of ADH1B and ALDH2 common dysfunctional variants on gout risk , 2017, Scientific Reports.

[20]  Peng Chen,et al.  A generic approach towards afterglow luminescent nanoparticles for ultrasensitive in vivo imaging , 2019, Nature Communications.

[21]  E. Zabarovsky,et al.  Cloning of two candidate tumor suppressor genes within a 10 kb region on chromosome 13q14, frequently deleted in chronic lymphocytic leukemia , 1997, Oncogene.

[22]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[23]  Harlan M. Krumholz,et al.  Whole-Genome Sequencing to Characterize Monogenic and Polygenic Contributions in Patients Hospitalized With Early-Onset Myocardial Infarction , 2019, Circulation.

[24]  Trevor Hastie,et al.  A Fast and Flexible Algorithm for Solving the Lasso in Large-scale and Ultrahigh-dimensional Problems , 2019 .

[25]  Stephanie E. Moser,et al.  Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative , 2017, bioRxiv.

[26]  Tamara S. Roman,et al.  New genetic loci link adipose and insulin biology to body fat distribution , 2014, Nature.

[27]  K. D. Sørensen,et al.  Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci , 2018, Nature Genetics.

[28]  L. P. Van den Heuvel,et al.  Hyperuricemia influences tryptophan metabolism via inhibition of multidrug resistance protein 4 (MRP4) and breast cancer resistance protein (BCRP). , 2013, Biochimica et biophysica acta.

[29]  N. Hamasaki,et al.  Variants of STAT6 (signal transducer and activator of transcription 6) in atopic asthma , 2000, Journal of medical genetics.

[30]  Matthew S. Lebo,et al.  Polygenic Prediction of Weight and Obesity Trajectories from Birth to Adulthood , 2019, Cell.

[31]  Trevor Hastie,et al.  Genetics of 35 blood and urine biomarkers in the UK Biobank , 2020, Nature Genetics.

[32]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[33]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[34]  V. M. V. Program A catalog of genetic loci associated with kidney function from analyses of a million individuals , 2019 .

[35]  E. Hardeman,et al.  The murine stanniocalcin 2 gene is a negative regulator of postnatal growth. , 2008, Endocrinology.

[36]  W. Ryu,et al.  Serum Uric Acid Levels and Cerebral Microbleeds in Patients with Acute Ischemic Stroke , 2013, PloS one.

[37]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[38]  Y. Tao Molecular mechanisms of the neural melanocortin receptor dysfunction in severe early onset obesity , 2005, Molecular and Cellular Endocrinology.

[39]  Hyon K. Choi,et al.  Epidemiology of gout in women: Fifty-two-year followup of a prospective cohort. , 2010, Arthritis and rheumatism.

[40]  Andres Metspalu,et al.  Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores , 2016, Genetics in Medicine.

[41]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[42]  B. Nordestgaard,et al.  Alcoholism and alcohol drinking habits predicted from alcohol dehydrogenase genes , 2006, The Pharmacogenomics Journal.

[43]  R. Dripps Lung function. , 1949, The American journal of roentgenology and radium therapy.

[44]  Christopher M. DeBoever,et al.  Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight novel adipocyte biology , 2018, bioRxiv.

[45]  Donal N. Gorman,et al.  GWAS of self-reported mosquito bite size, itch intensity and attractiveness to mosquitoes implicates immune-related predisposition loci , 2017, Human molecular genetics.

[46]  P. D. Rango Prospective Cohort Studies. , 2016 .

[47]  Christopher M. DeBoever,et al.  Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology , 2019, Nature Communications.

[48]  P. Visscher,et al.  Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores , 2015, bioRxiv.

[49]  Mary E. Haas,et al.  Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations , 2018, Nature Genetics.

[50]  A. Valdes,et al.  Familial aggregation of gout and relative genetic and environmental contributions: a nationwide population study in Taiwan , 2013, Annals of the rheumatic diseases.

[51]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[52]  Kristen S Purrington,et al.  Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes , 2018, American Journal of Human Genetics.

[53]  Krista A. Zanetti,et al.  The Consortium of Metabolomics Studies (COMETS): Metabolomics in 47 Prospective Cohort Studies. , 2019, American journal of epidemiology.

[54]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[55]  Alberto Piazza,et al.  Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants , 2009, Nature Genetics.

[56]  Christopher M. DeBoever,et al.  Assessing digital phenotyping to enhance genetic studies of human diseases , 2019, bioRxiv.

[57]  Haniye Sadat Sajadi,et al.  Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017 , 2018, The Lancet.

[58]  Chang-Fu Kuo,et al.  Rheumatoid arthritis prevalence, incidence, and mortality rates: a nationwide population study in Taiwan , 2013, Rheumatology International.

[59]  C McRae,et al.  Myocardial infarction. , 2019, Australian family physician.

[60]  R. Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[61]  Karsten B. Sieber,et al.  A catalog of genetic loci associated with kidney function from analyses of a million individuals , 2019, Nature Genetics.

[62]  Karl-Hans Englmeier,et al.  A comparison of Gap statistic definitions with and without logarithm function , 2011, ArXiv.

[63]  C. Gieger,et al.  A genome-wide association meta-analysis on lipoprotein (a) concentrations adjusted for apolipoprotein (a) isoforms[S] , 2017, Journal of Lipid Research.

[64]  C. Sudlow,et al.  Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population , 2017, American journal of epidemiology.

[65]  Christopher. Simons,et al.  Machine learning with Python , 2017 .

[66]  Manolis Kellis,et al.  FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. , 2015, The New England journal of medicine.

[67]  Jack Euesden,et al.  PRSice: Polygenic Risk Score software , 2014, Bioinform..

[68]  D. Belsky,et al.  Development and Evaluation of a Genetic Risk Score for Obesity , 2013, Biodemography and social biology.