Splice Expression Variation Analysis (SEVA) for Inter-tumor Heterogeneity of Gene Isoform Usage in Cancer

Motivation Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches. Results We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data. Availability and implementation SEVA is implemented in the R/Bioconductor package GSReg. Contact bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Julie A. Dickerson,et al.  Comparisons of computational methods for differential alternative splicing detection using RNA-seq in plant systems , 2014, BMC Bioinformatics.

[2]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[3]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[4]  Lan Lin,et al.  rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data , 2014, Proceedings of the National Academy of Sciences.

[5]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[6]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[7]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of head and neck squamous cell carcinomas , 2015, Nature.

[8]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[9]  Alyssa C. Frazee,et al.  Polyester: Simulating RNA-Seq Datasets With Differential Transcript Expression , 2014, bioRxiv.

[10]  Donald Geman,et al.  Identifying Tightly Regulated and Variably Expressed Networks by Differential Rank Conservation (DIRAC) , 2010, PLoS Comput. Biol..

[11]  G. Ast,et al.  Alternative splicing and evolution: diversification, exon definition and function , 2010, Nature Reviews Genetics.

[12]  Vasyl Pihur,et al.  Gene expression anti-profiles as a basis for accurate universal cancer signatures , 2012, BMC Bioinformatics.

[13]  Wolfgang Huber,et al.  Detecting differential usage of exons from RNA-Seq data , 2012 .

[14]  Benjamin J. Raphael,et al.  Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in human genes , 2011, Proceedings of the National Academy of Sciences.

[15]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[16]  Trey Ideker,et al.  Characterization of functionally active gene fusions in human papillomavirus related oropharyngeal squamous cell carcinoma , 2016, International journal of cancer.

[17]  Michael F. Ochs,et al.  Expression Microarray Analysis Reveals Alternative Splicing of LAMA3 and DST Genes in Head and Neck Squamous Cell Carcinoma , 2014, PloS one.

[18]  Gael P. Alamancos,et al.  Leveraging transcript quantification for fast computation of alternative splicing profiles , 2015, bioRxiv.

[19]  Donald Geman,et al.  Rank discriminants for predicting phenotypes from RNA expression , 2014, 1401.1490.

[20]  James B. Brown,et al.  Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation , 2011, Proceedings of the National Academy of Sciences.

[21]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[22]  Yi Xing,et al.  An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs , 2006, Nucleic acids research.

[23]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[24]  Juw Won Park,et al.  MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data , 2012, Nucleic acids research.

[25]  Eduardo Eyras,et al.  Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer , 2015, Nucleic acids research.

[26]  S. Sabunciyan,et al.  CLASS2: accurate and efficient splice variant annotation from RNA-seq reads , 2014, bioRxiv.

[27]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[28]  Bahman Afsari,et al.  A Novel Functional Splice Variant of AKT3 Defined by Analysis of Alternative Splice Expression in HPV-Positive Oropharyngeal Cancers. , 2017, Cancer research.

[29]  K. Reinert,et al.  CIDANE: Comprehensive isoform discovery and abundance estimation , 2015, bioRxiv.

[30]  Jinze Liu,et al.  DiffSplice: the genome-wide detection of differential splicing events with RNA-seq , 2012 .

[31]  Alyssa C. Frazee,et al.  Ballgown bridges the gap between transcriptome assembly and expression analysis , 2015, Nature Biotechnology.

[32]  E. Mroz,et al.  Intra-tumor Genetic Heterogeneity and Mortality in Head and Neck Cancer: Analysis of Data from The Cancer Genome Atlas , 2015, PLoS medicine.

[33]  Sampo Pyysalo,et al.  Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.

[34]  B. Ebert,et al.  Mutations in RNA splicing machinery in human cancers. , 2011, New England Journal of Medicine.