In silico proof of principle of machine learning-based antibody design at unconstrained scale

Generative machine learning (ML) has been postulated to be a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody binding parameters. The simulation framework enables both the computation of antibody-antigen 3D-structures as well as functions as an oracle for unrestricted prospective evaluation of the antigen specificity of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (1D) data can be used to design native-like conformational (3D) epitope-specific antibodies, matching or exceeding the training dataset in affinity and developability variety. Furthermore, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Finally, we validated that the antibody design insight gained from simulated antibody-antigen binding data is applicable to experimental real-world data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design. Highlights A large-scale dataset of 70M [3 orders of magnitude larger than the current state of the art] synthetic antibody-antigen complexes, that reflect biological complexity, allows the prospective evaluation of antibody generative deep learning Combination of generative learning, synthetic antibody-antigen binding data, and prospective evaluation shows that deep learning driven antibody design and discovery at an unconstrained level is feasible Transfer learning (low-N learning) coupled to generative learning shows that antibody-binding rules may be transferred across unrelated antibody-antigen complexes Experimental validation of antibody-design conclusions drawn from deep learning on synthetic antibody-antigen binding data Graphical abstract We leverage large synthetic ground-truth data to demonstrate the (A,B) unconstrained deep generative learning-based generation of native-like antibody sequences, (C) the prospective evaluation of conformational (3D) affinity, paratope-epitope pairs, and developability. (D) Finally, we show increased generation quality of low-N-based machine learning models via transfer learning.

[1]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[2]  Smita Raghava,et al.  Predicting Antibody Developability Profiles Through Early Stage Discovery Screening , 2020, mAbs.

[3]  R. Emerson,et al.  Massively multiplexed affinity characterization of therapeutic antibodies against SARS-CoV-2 variants , 2021, bioRxiv.

[4]  Low-N protein engineering with data-efficient deep learning. , 2021, Nature methods.

[5]  Cédric R. Weber,et al.  Learning the High-Dimensional Immunogenomic Features That Predict Public and Private Antibody Repertoires , 2017, The Journal of Immunology.

[6]  Hsin-Jung Li,et al.  Development of therapeutic antibodies for the treatment of diseases , 2020, Journal of Biomedical Science.

[7]  Adam J. Riesselman,et al.  Protein design and variant prediction using autoregressive generative models , 2019, Nature Communications.

[8]  Jenna Kim,et al.  The impact of imbalanced training data on machine learning for author name disambiguation , 2018, Scientometrics.

[9]  Ingrid Hobæk Haff,et al.  One billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction , 2021 .

[10]  Daniel Neumeier,et al.  Convergent selection in antibody repertoires is revealed by deep learning , 2020, bioRxiv.

[11]  Mirko Omejc,et al.  Drug Development: The Journey of a Medicine from Lab to Shelf , 2020 .

[12]  S. Hochreiter,et al.  On Failure Modes of Molecule Generators and Optimizers , 2020 .

[13]  Graham W. Taylor,et al.  Instance Selection for GANs , 2020, NeurIPS.

[14]  A. Yermanos,et al.  Applications of Machine and Deep Learning in Adaptive Immunity. , 2021, Annual review of chemical and biomolecular engineering.

[15]  Rafał Kurczab,et al.  The influence of the negative-positive ratio and screening database size on the performance of machine learning-based virtual screening , 2017, PloS one.

[16]  G. A. Lazar,et al.  Next generation antibody drugs: pursuit of the 'high-hanging fruit' , 2017, Nature Reviews Drug Discovery.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Charlotte M. Deane,et al.  Producing High-Accuracy Lattice Models from Protein Atomic Coordinates Including Side Chains , 2012, Adv. Bioinformatics.

[19]  Cédric R. Weber,et al.  High-throughput antibody engineering in mammalian cells by CRISPR/Cas9-mediated homology-directed mutagenesis , 2018, bioRxiv.

[20]  Luis V. Santana-Quintero,et al.  A new and updated resource for codon usage tables , 2017, BMC Bioinformatics.

[21]  C. Deane,et al.  Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires , 2018, The Journal of Immunology.

[22]  Yanay Ofran,et al.  Computational design of antibodies. , 2018, Current opinion in structural biology.

[23]  Frank Grosveld,et al.  A human monoclonal antibody blocking SARS-CoV-2 infection , 2020, Nature Communications.

[24]  Geir Kjetil Sandve,et al.  immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking , 2019, bioRxiv.

[25]  S. Metsugi,et al.  Antibody design using LSTM based deep generative model from phage display library for affinity maturation , 2021, Scientific Reports.

[26]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[27]  Cédric R. Weber,et al.  A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. , 2021, Cell reports.

[28]  Morten Nielsen,et al.  Improved prediction of MHC II antigen presentation through integration and motif deconvolution of mass spectrometry MHC eluted ligand data. , 2020, Journal of proteome research.

[29]  Kadina E. Johnston,et al.  Protein sequence design with deep generative models , 2021, Current opinion in chemical biology.

[30]  William S. DeWitt,et al.  Deep generative models for T cell receptor protein sequences , 2019, eLife.

[31]  Ya Chen,et al.  Validation strategies for target prediction methods , 2019, Briefings Bioinform..

[32]  Rahmad Akbar,et al.  Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires , 2019, Molecular Systems Design & Engineering.

[33]  Dan Jurafsky,et al.  Utility Is in the Eye of the User: A Critique of NLP Leaderboard Design , 2020, EMNLP.

[34]  A. H. Laustsen,et al.  Animal Immunization, in Vitro Display Technologies, and Machine Learning for Antibody Discovery. , 2021, Trends in biotechnology.

[35]  Viktor Seib,et al.  Mixing Real and Synthetic Data to Enhance Neural Network Training - A Review of Current Approaches , 2020, ArXiv.

[36]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[37]  Protein design and variant prediction using autoregressive generative models , 2021, Nature communications.

[38]  Alexander Yermanos,et al.  immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking , 2020, Bioinformatics.

[39]  Jordan Graves,et al.  A Review of Deep Learning Methods for Antibodies , 2020, Antibodies.

[40]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[41]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[42]  Friedrich Rippmann,et al.  Interpretable Deep Learning in Drug Discovery , 2019, Explainable AI.

[43]  Michael Meyer-Hermann,et al.  A 3D structural affinity model for multi-epitope in silico germinal center simulations , 2019, bioRxiv.

[44]  Srivamshi Pittala,et al.  Learning Context-aware Structural Representations to Predict Antigen and Antibody Binding Interfaces. , 2020, Bioinformatics.

[45]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[46]  Jeffrey J. Gray,et al.  Deep Learning in Protein Structural Modeling and Design , 2020, Patterns.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Anthony Gitter,et al.  Neural networks to learn protein sequence–function relationships from deep mutational scanning data , 2020, Proceedings of the National Academy of Sciences.

[49]  Andrew C. R. Martin,et al.  AbDb: antibody structure database—a database of PDB-derived antibody structures , 2018, Database J. Biol. Databases Curation.

[50]  Sebastian Kelm,et al.  Computational approaches to therapeutic antibody design: established methods and emerging trends , 2019, Briefings Bioinform..

[51]  Lindsay G. Cowell,et al.  Mining adaptive immune receptor repertoires for biological and clinical information using machine learning , 2020 .

[52]  Cynthia Liu,et al.  Research and Development on Therapeutic Agents and Vaccines for COVID-19 and Related Human Coronavirus Diseases , 2020, ACS central science.

[53]  Tileli Amimeur,et al.  Designing Feature-Controlled Humanoid Antibody Discovery Libraries Using Generative Adversarial Networks , 2020, bioRxiv.

[54]  Cédric R. Weber,et al.  Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning , 2021, Nature Biomedical Engineering.

[55]  A. Butté,et al.  Machine Learning for Biologics: Opportunities for Protein Engineering, Developability, and Formulation , 2021, Trends in Pharmacological Sciences.

[56]  Philippe A. Robert,et al.  Induction of broadly neutralizing antibodies in Germinal Centre simulations. , 2018, Current opinion in biotechnology.

[57]  Ian Kerman,et al.  Predicting Antibody Developability from Sequence using Machine Learning , 2020, bioRxiv.

[58]  Gisbert Schneider,et al.  Drug discovery with explainable artificial intelligence , 2020, Nature Machine Intelligence.

[59]  Ulrich Bodenhofer,et al.  KeBABS: an R package for kernel-based analysis of biological sequences , 2015, Bioinform..

[60]  Cédric R. Weber,et al.  Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development. , 2017, Cell reports.

[61]  Cédric R. Weber,et al.  High-throughput antibody engineering in mammalian cells by CRISPR/Cas9-mediated homology-directed mutagenesis , 2018, bioRxiv.

[62]  John F. Canny,et al.  MSA Transformer , 2021, bioRxiv.

[63]  W. Marasco,et al.  The growth and potential of human antiviral monoclonal antibody therapeutics , 2007, Nature Biotechnology.

[64]  Sai T. Reddy,et al.  Immune Literacy: Reading, Writing, and Editing Adaptive Immunity , 2020, iScience.

[65]  Christos A. Nicolaou,et al.  Molecular property prediction: recent trends in the era of artificial intelligence. , 2019, Drug discovery today. Technologies.

[66]  Namrata Anand,et al.  Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation , 2020, bioRxiv.

[67]  Geir Kjetil Sandve,et al.  Modern Hopfield Networks and Attention for Immune Repertoire Classification , 2020, bioRxiv.

[68]  P. Vermeesch,et al.  An R package for statistical provenance analysis , 2016 .

[69]  Jiye Shi,et al.  Five computational developability guidelines for therapeutic antibody profiling , 2019, Proceedings of the National Academy of Sciences.

[70]  Thomas Mensink,et al.  Factors of Influence for Transfer Learning Across Diverse Appearance Domains and Task Types , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Jeffrey J. Gray,et al.  Antibody structure prediction using interpretable deep learning , 2021, bioRxiv.

[72]  Chris Bailey-Kellogg,et al.  Learning Context-aware Structural Representations to Predict Antigen and Antibody Binding Interfaces , 2019, bioRxiv.