SRSF shape analysis for sequencing data reveal new differentiating patterns

MOTIVATION Sequencing-based methods to examine fundamental features of the genome, such as gene expression and chromatin structure, rely on inferences from the abundance and distribution of reads derived from Illumina sequencing. Drawing sound inferences from such experiments relies on appropriate mathematical methods to model the distribution of reads along the genome, which has been challenging due to the scale and nature of these data. RESULTS We propose a new framework (SRSFseq) based on square root slope functions shape analysis to analyse Illumina sequencing data. In the new approach the basic unit of information is the density of mapped reads over region of interest located on the known reference genome. The densities are interpreted as shapes and a new shape analysis model is proposed. An equivalent of a Fisher test is used to quantify the significance of shape differences in read distribution patterns between groups of density functions in different experimental conditions. We evaluated the performance of this new framework to analyze RNA-seq data at the exon level, which enabled the detection of variation in read distributions and abundances between experimental conditions not detected by other methods. Thus, the method is a suitable supplement to the state-of-the-art count based techniques. The variety of density representations and flexibility of mathematical design allow the model to be easily adapted to other data types or problems in which the distribution of reads is to be tested. The functional interpretation and SRSF phase-amplitude separation technique give an efficient noise reduction procedure improving the sensitivity and specificity of the method.

[1]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[2]  Wolfgang Huber,et al.  Detecting differential usage of exons from RNA-Seq data , 2012 .

[3]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[4]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[5]  Abdelhak. Zoglat,et al.  Analysis of variance for functional data. , 1994 .

[6]  Anuj Srivastava,et al.  Statistical Analysis and Modeling of Elastic Functions , 2011 .

[7]  C. Amos,et al.  RNA-Seq Analysis of Differential Splice Junction Usage and Intron Retentions by DEXSeq , 2015, PloS one.

[8]  Gregory R. Grant,et al.  Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data , 2015, Bioinform..

[9]  Daniel Nilsson,et al.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge , 2014, Genome Biology.

[10]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[11]  Clifford A. Meyer,et al.  Nucleosome Dynamics Define Transcriptional Enhancers , 2010, Nature Genetics.

[12]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[13]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[14]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[15]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[16]  Zhaoyu Li,et al.  DANPOS: Dynamic analysis of nucleosome position and occupancy by sequencing , 2013, Genome research.

[17]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[18]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[19]  P. Chambon,et al.  Differences in Gene Expression between Wild Type and Hoxa1 Knockout Embryonic Stem Cells after Retinoic Acid Treatment or Leukemia Inhibitory Factor (LIF) Removal* , 2005, Journal of Biological Chemistry.

[20]  Tao Zhu,et al.  Human Growth Hormone-regulated HOXA1 Is a Human Mammary Epithelial Oncogene* , 2003, The Journal of Biological Chemistry.

[21]  Jianxing Feng,et al.  DiNuP: a systematic approach to identify regions of differential nucleosome positioning , 2012, Bioinform..

[22]  Myles Brown,et al.  BINOCh: binding inference from nucleosome occupancy changes , 2011, Bioinform..

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .