Integration of Mechanistic Immunological Knowledge into a Machine Learning Pipeline Increases Predictive Power

The dense network of interconnected cellular signaling responses quantifiable in peripheral immune cells provide a wealth of actionable immunological insights. While high-throughput single-cell profiling techniques, including polychromatic flow and mass cytometry, have matured to a point that enables detailed immune profiling of patients in numerous clinical settings, limited cohort size together with the high dimensionality of data increases the possibility of false positive discoveries and model overfitting. We introduce a machine learning platform, the immunological Elastic-Net (iEN), which incorporates immunological knowledge directly into the predictive models. Importantly, the algorithm maintains the exploratory nature of the high-dimensional dataset, allowing for the inclusion of immune features with strong predictive power even if not consistent with prior knowledge. In three independent studies our method demonstrates improved predictive power for clinically-relevant outcomes from mass cytometry data generated from whole blood, as well as a large simulated dataset.

[1]  W. Ketterl [Periodontal diseases]. , 1971, Der Zahnarzt; Colloquium med. dent.

[2]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[3]  D. Levy,et al.  Interferon-induced nuclear signalling by Jak protein tyrosine kinases , 1993, Nature.

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[7]  N. Kadowaki,et al.  Subsets of Human Dendritic Cell Precursors Express Different Toll-like Receptors and Respond to Different Microbial Antigens , 2001, The Journal of experimental medicine.

[8]  I. Caramalho,et al.  Regulatory T Cells Selectively Express Toll-like Receptors and Are Activated by Lipopolysaccharide , 2003, The Journal of experimental medicine.

[9]  Peter O. Krutzik,et al.  Intracellular phospho‐protein staining techniques for flow cytometry: Monitoring single cell signaling events , 2003, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[10]  P. Heinrich,et al.  Principles of interleukin (IL)-6-type cytokine signalling and its regulation. , 2003, The Biochemical journal.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  A. Hoffmann,et al.  Signaling pathways and genes that inhibit pathogen-induced macrophage apoptosis--CREB and NF-kappaB as key regulators. , 2005, Immunity.

[15]  N. Johnson,et al.  Periodontal diseases , 2005, The Lancet.

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[18]  Wei Pan,et al.  Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data , 2007, Bioinform..

[19]  Naftali Tishby,et al.  Incorporating Prior Knowledge on Features into Learning , 2007, AISTATS.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  O. Ornatsky,et al.  Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. , 2009, Analytical chemistry.

[22]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[23]  B. Beutler TLRs and innate immunity. , 2009, Blood.

[24]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[25]  Chris Hans Elastic Net Regression Modeling With the Orthant Normal Prior , 2011 .

[26]  I. Glad,et al.  Weighted Lasso with Data Integration , 2011, Statistical applications in genetics and molecular biology.

[27]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[28]  J. Scheller,et al.  The pro- and anti-inflammatory properties of the cytokine interleukin-6. , 2011, Biochimica et biophysica acta.

[29]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[30]  A. Bagg,et al.  Chimeric antigen receptor-modified T cells in chronic lymphoid leukemia. , 2011, The New England journal of medicine.

[31]  J. Sprent,et al.  The role of interleukin-2 during homeostasis and activation of the immune system , 2012, Nature Reviews Immunology.

[32]  Sean C. Bendall,et al.  Normalization of mass cytometry data with bead standards , 2013, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[33]  P. Arck,et al.  Fetomaternal immune cross-talk and its consequences for maternal and offspring's health , 2013, Nature Medicine.

[34]  L. Ivashkiv,et al.  Regulation of type I interferon responses , 2013, Nature Reviews Immunology.

[35]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[36]  M. Adib-Conquy,et al.  TLR‐mediated activation of NK cells and their role in bacterial/viral immune responses in mammals , 2014, Immunology and cell biology.

[37]  S. Fisher,et al.  Preterm labor: One syndrome, many causes , 2014, Science.

[38]  Shriprakash Sinha Integration of prior biological knowledge and epigenetic information enhances the prediction accuracy of the Bayesian Wnt pathway. , 2014, Integrative biology : quantitative biosciences from nano to macro.

[39]  C. Murray,et al.  Global Burden of Severe Periodontitis in 1990-2010 , 2014, Journal of dental research.

[40]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[41]  Piet Demeester,et al.  FlowSOM: Using self‐organizing maps for visualization and interpretation of cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[42]  Eli R. Zunder,et al.  Palladium-based mass tag cell barcoding with a doublet-filtering scheme and single-cell deconvolution algorithm , 2015, Nature Protocols.

[43]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[44]  C. Hunter,et al.  IL-6 as a keystone cytokine in health and disease , 2015, Nature Immunology.

[45]  Yiming Zuo,et al.  Integrating prior biological knowledge and graphical LASSO for network inference , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[46]  P. Eke,et al.  Update on Prevalence of Periodontitis in Adults in the United States: NHANES 2009 to 2012. , 2015, Journal of periodontology.

[47]  A. Metidji,et al.  IFN-α/β Receptor Signaling Promotes Regulatory T Cell Development and Function under Stress Conditions , 2015, The Journal of Immunology.

[48]  G. Nolan,et al.  Automated Mapping of Phenotype Space with Single-Cell Data , 2016, Nature Methods.

[49]  E. Coccia,et al.  IFN‐α promotes rapid human Treg contraction and late Th1‐like Treg decrease , 2016, Journal of leukocyte biology.

[50]  J. P. McCoy,et al.  Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium , 2016, Scientific Reports.

[51]  E. Newell,et al.  Mass cytometry: blessed with the curse of dimensionality , 2016, Nature Immunology.

[52]  R. Tibshirani,et al.  An immune clock of human pregnancy , 2017, Science Immunology.

[53]  Mark M. Davis,et al.  Systems immunology: just getting started , 2017, Nature Immunology.

[54]  Marco Gori,et al.  Integrating Prior Knowledge into Deep Learning , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[55]  S. Shen-Orr,et al.  Social network architecture of human immune cells unveiled by quantitative proteomics , 2017, Nature Immunology.

[56]  Alexandros Kalousis,et al.  Regularising Non-linear Models Using Feature Side-information , 2017, ICML.

[57]  Alexander R. Pico,et al.  Fibrin-targeting immunotherapy protects against neuroinflammation and neurodegeneration , 2018, Nature Immunology.

[58]  E. Saphire,et al.  Antibody-mediated protection against Ebola virus , 2018, Nature Immunology.

[59]  P. Chattopadhyay,et al.  OMIP‐050: A 28‐color/30‐parameter Fluorescence Flow Cytometry Panel to Enumerate and Characterize Cells Expressing a Wide Array of Immune Checkpoint Molecules , 2018, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[60]  Lucie Abeler-Dörner,et al.  flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry , 2018, Bioinform..

[61]  Jeffrey A. Wiser,et al.  Immune-centric network of cytokines and cells in disease context identified by computational mining of PubMed , 2018, Nature Biotechnology.

[62]  Ralf Eggeling,et al.  Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data , 2019, Bioinform..

[63]  P. Chattopadhyay,et al.  High-Parameter Single-Cell Analysis. , 2019, Annual review of analytical chemistry.

[64]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[65]  Bruno Agard,et al.  Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy , 2018, Bioinform..

[66]  H. Deshmukh,et al.  Immunological Basis for Recurrent Fetal Loss and Pregnancy Complications. , 2019, Annual review of pathology.

[67]  Brandon LeBeau Simulate Models Based on the Generalized Linear Model [R package simglm version 0.8.0] , 2020 .

[68]  B. Rost,et al.  Validity of machine learning in biology and medicine increased through collaborations across fields of expertise , 2020, Nature Machine Intelligence.

[69]  Jiuyong Li,et al.  Accurate data-driven prediction does not mean high reproducibility , 2020 .

[70]  L. Hood,et al.  Deep phenotyping during pregnancy for predictive and preventive medicine , 2020, Science Translational Medicine.