De novo mutational signature discovery in tumor genomes using SparseSignatures

Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates DNA replication error as a background, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using standard metrics. We then apply SparseSignatures to whole genome sequences of 147 tumors from pancreatic cancer, discovering 8 signatures in addition to the background.

[1]  S. Lindsay,et al.  Signatures of Mutational Processes in Human DNA Evolution , 2021, bioRxiv.

[2]  M. Stratton,et al.  The mutational landscape of human somatic and germline cells , 2020, Nature.

[3]  A. Balmain,et al.  The mutational signature profile of known and suspected human carcinogens in mice , 2020, Nature genetics.

[4]  M. Stratton,et al.  Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis , 2019, Cell.

[5]  Ville Mustonen,et al.  The repertoire of mutational signatures in human cancer , 2018, Nature.

[6]  Peter J. Campbell,et al.  Somatic mutant clones colonize the human esophagus with age , 2018, Science.

[7]  Z. Weng,et al.  Comprehensive genomic characterization of breast tumors with BRCA1 and BRCA2 mutations , 2018, BMC Medical Genomics.

[8]  Adrian Baez-Ortega,et al.  sigfit: flexible Bayesian inference of mutational signatures , 2018, bioRxiv.

[9]  S. Shariat,et al.  Association of Smoking Status With Recurrence, Metastasis, and Mortality Among Patients With Localized Prostate Cancer Undergoing Prostatectomy or Radiotherapy: A Systematic Review and Meta-analysis , 2018, JAMA oncology.

[10]  M. Jia,et al.  APOBEC3B and APOBEC mutational signature as potential predictive markers for immunotherapy response in non-small cell lung cancer , 2018, Oncogene.

[11]  Daniele Ramazzotti,et al.  Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival , 2018, Nature Communications.

[12]  Junjie Zhu,et al.  SIMLR: A Tool for Large‐Scale Genomic Analyses by Multi‐Kernel Learning , 2018, Proteomics.

[13]  Mary Goldman,et al.  Online resources for PCAWG data exploration, visualization, and discovery , 2017 .

[14]  Lei Zhang,et al.  Differences between germline and somatic mutation rates in humans and mice , 2017, Nature Communications.

[15]  Anna R. Panchenko,et al.  Exploring background mutational processes to decipher cancer genetic heterogeneity , 2017, Nucleic Acids Res..

[16]  Heidi Ledford DNA typos to blame for most cancer mutations , 2017, Nature.

[17]  J. Asara,et al.  Chromatin association of XRCC5/6 in the absence of DNA damage depends on the XPE gene product DDB2 , 2017, Molecular biology of the cell.

[18]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[19]  Rafael Rosales,et al.  signeR: an empirical Bayesian approach to mutational signature discovery , 2017, Bioinform..

[20]  Hans Clevers,et al.  Tissue-specific mutation accumulation in human adult stem cells during life , 2016, Nature.

[21]  M. Stratton,et al.  Mutational signatures associated with tobacco smoking in human cancer , 2016, Science.

[22]  David C. Jones,et al.  Landscape of somatic mutations in 560 breast cancer whole genome sequences , 2016, Nature.

[23]  B. Taylor,et al.  deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution , 2016, Genome Biology.

[24]  Eve Shinbrot,et al.  Mutation signatures reveal biological processes in human cancer , 2016, bioRxiv.

[25]  Arthur Wuster,et al.  Timing, rates and spectra of human germline mutation , 2015, Nature Genetics.

[26]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[27]  M. Stratton,et al.  Clock-like mutational processes in human somatic cells , 2015, Nature Genetics.

[28]  M. Stratton,et al.  The genome as a record of environmental exposure , 2015, Mutagenesis.

[29]  M. Stephens,et al.  A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures , 2015, bioRxiv.

[30]  Jessica Zucman-Rossi,et al.  Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets , 2015, Nature Genetics.

[31]  Julian Gehring,et al.  SomaticSignatures: inferring mutational signatures from single-nucleotide variants , 2014, bioRxiv.

[32]  D. Brash UV Signature Mutations , 2015, Photochemistry and photobiology.

[33]  M. Puigt,et al.  Non-negative Matrix Factorization under equality constraints—a study of industrial source identification , 2014 .

[34]  Serena Nik-Zainal,et al.  Mechanisms underlying mutational signatures in human cancers , 2014, Nature Reviews Genetics.

[35]  G. Parmigiani,et al.  Heterogeneity of genomic evolution and mutational profiles in multiple myeloma , 2014, Nature Communications.

[36]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[37]  P. Campbell,et al.  EMu: probabilistic inference of mutational processes and their localization in the cancer genome , 2013, Genome Biology.

[38]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[39]  Peter W. Laird,et al.  Interplay between the Cancer Genome and Epigenome , 2013, Cell.

[40]  A. McKenna,et al.  Exome and whole genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity , 2013, Nature Genetics.

[41]  M. Stratton,et al.  Deciphering Signatures of Mutational Processes Operative in Human Cancer , 2013, Cell reports.

[42]  Vincent Y. F. Tan,et al.  Automatic Relevance Determination in Nonnegative Matrix Factorization with the /spl beta/-Divergence , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[44]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[45]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[46]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[47]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[48]  Dietrich Lehmann,et al.  Nonsmooth nonnegative matrix factorization (nsNMF) , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[50]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[51]  P. Green,et al.  Transcription-associated mutational asymmetry in mammalian evolution , 2003, Nature Genetics.

[52]  N. Tretyakova,et al.  Tobacco smoke carcinogens, DNA damage and p53 mutations in smoking-associated cancers , 2002, Oncogene.

[53]  J. Essigmann,et al.  The chemistry and biology of aflatoxin B(1): from mutational spectrometry to carcinogenesis. , 2001, Carcinogenesis.

[54]  P. Boffetta,et al.  Risk of childhood cancer and adult lung cancer after childhood exposure to passive smoke: A meta-analysis. , 1999, Environmental health perspectives.

[55]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .