Improving Type 2 Diabetes Phenotypic Classification by Combining Genetics and Conventional Risk Factors

Type 2 Diabetes condition is a multifactorial disorder involves the convergence of genetics, environment, diet and lifestyle risk factors. This paper investigates genetic and conventional (clinical, sociodemographic) risk factors and their predictive power in classifying Type 2 Diabetes. Six statistically significant Single Nucleotide Polymorphisms (SNPs) associated with Type 2 Diabetes are derived by conducting logistic association analysis. The derived SNPs in addition to conventional risk factors are used to model supervised machine learning algorithms to classify cases and controls in genome wide association studies (GWAS). Models are trained using genetic variable analysis, genetic and conventional variable analysis, and conventional variable analysis. The results demonstrate of the three models, higher predictive capacity is evident when genetic and conventional predictors are combined. Using a Random Forest classifier, the Area Under the Curve=73.96%, Sensitivity=68.42 %, and Specificity=78.67%.

[1]  Jason H. Moore,et al.  Bioinformatics challenges for genome-wide association studies , 2010, Bioinform..

[2]  Dhiya Al-Jumeily,et al.  Machine learning approaches for the prediction of obesity using publicly available genetic profiles , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[3]  D. Matthews,et al.  Management of Hyperglycemia in Type 2 Diabetes: A Patient-Centered Approach , 2012, Diabetes Care.

[4]  C. Dolea,et al.  World Health Organization , 1949, International Organization.

[5]  Scott M. Williams,et al.  challenges for genome-wide association studies , 2010 .

[6]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[7]  C. Mathers,et al.  Projections of Global Mortality and Burden of Disease from 2002 to 2030 , 2006, PLoS medicine.

[8]  Dhiya Al-Jumeily,et al.  A machine learning system for automated whole-brain seizure detection , 2016 .

[9]  A. Tretyn,et al.  Sequencing technologies and genome sequencing , 2011, Journal of Applied Genetics.

[10]  Andreas Zell,et al.  Use of support vector machines for disease risk prediction in genome‐wide association studies: Concerns and opportunities , 2012, Human mutation.

[11]  Kriti Saroha,et al.  Study of dimension reduction methodologies in data mining , 2015, International Conference on Computing, Communication & Automation.

[12]  C. Knibbe,et al.  Global trends in the incidence and prevalence of type 2 diabetes in children and adolescents: a systematic review and evaluation of methodological approaches , 2013, Diabetologia.

[13]  D. Matthews,et al.  Management of Hyperglycemia in Type 2 Diabetes: A Patient-Centered Approach , 2012, Diabetes Care.

[14]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[15]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[16]  K. Narayan,et al.  Clinical risk factors, DNA variants, and the development of type 2 diabetes. , 2009, The New England journal of medicine.

[17]  Joseph T. Glessner,et al.  From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes , 2009, PLoS genetics.

[18]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[19]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[20]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[21]  Aeilko H. Zwinderman,et al.  Correlating multiple SNPs and multiple disease phenotypes: penalized non-linear canonical correlation analysis , 2009, Bioinform..

[22]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[23]  Nicholette D. Palmer,et al.  Analysis of candidate genes on chromosome 20q12-13.1 reveals evidence for BMI mediated association of PREX1 with type 2 diabetes in European Americans. , 2010, Genomics.

[24]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[25]  X. Chen,et al.  Random forests for genomic data analysis. , 2012, Genomics.

[26]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[27]  Maxwell W. Libbrecht Machine learning in genetics and genomics , 2017 .

[28]  Jason H. Moore,et al.  Chapter 11: Genome-Wide Association Studies , 2012, PLoS Comput. Biol..

[29]  J. Laitinen,et al.  Barriers to regular exercise among adults at high risk or diagnosed with type 2 diabetes: a systematic review. , 2009, Health promotion international.

[30]  Dhiya Al-Jumeily,et al.  Dynamic neural network architecture inspired by the immune algorithm to predict preterm deliveries in pregnant women , 2015, Neurocomputing.

[31]  R. Trevethan,et al.  Sensitivity, Specificity, and Predictive Values: Foundations, Pliabilities, and Pitfalls in Research and Practice , 2017, Front. Public Health.

[32]  Cengizhan Açikel,et al.  Discovering missing heritability and early risk prediction for type 2 diabetes: a new perspective for genome-wide association study analysis with the Nurses' Health Study and the Health Professionals' Follow-Up Study. , 2014, Turkish journal of medical sciences.

[33]  Stephen J. Sharp,et al.  A Prospective Study of the Association Between Quantity and Variety of Fruit and Vegetable Intake and Incident Type 2 Diabetes , 2012, Diabetes Care.

[34]  Muhammad Zubair Shafiq,et al.  Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets , 2009, EvoBIO.

[35]  Paulo J. G. Lisboa,et al.  A robust method for the interpretation of genomic data , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).