Hidden treasures in unspliced EST data

Several classes of exclusively—or at least predominantly—unspliced non-coding RNAs have been described in the last years, including totally and partially intronic transcripts and long intergenic RNAs. Functionally, they appear to be involved in regulating gene expression, at least in part by associating with the chromatin. Intron-less transcripts have received little attention, even though recent findings indicate that intron-less protein-coding genes have several features that set them apart from the more abundant and much better understood spliced mRNAs. Even less is known about unspliced non-coding transcripts. Thus we systematically analyze the distribution of unspliced ESTs in the human genome. These form a large source of transcriptomic data that is almost always excluded from detailed studies. Most unspliced ESTs appear in clusters overlapping, or located in the close vicinity of, annotated RefSeq genes. Partially intronic unspliced ESTs show complex patterns of overlap with the intron/exon structure of the RefSeq gene. Distinctive patterns of CAGE tags indicate that a large class of unspliced EST clusters is forming long extensions of 3′UTRs, at least several hundreds of which probably appear also as independent 3′UTR-associated RNAs.

[1]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[2]  E. Wagner,et al.  Imprinted expression of the Igf2r gene depends on an intronic CpG island , 1997, Nature.

[3]  Nick Goldman,et al.  RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. , 2011, RNA.

[4]  J. W. Neidigh,et al.  Designing a 20-residue protein , 2002, Nature Structural Biology.

[5]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[6]  Douglas G Scofield,et al.  Intron size, abundance, and distribution within untranslated regions of genes. , 2006, Molecular biology and evolution.

[7]  I. J. Paul,et al.  Chromatin-associated RNA content of heterochromatin and euchromatin , 1975, Molecular and Cellular Biochemistry.

[8]  Jun Kawai,et al.  Clusters of Internally Primed Transcripts Reveal Novel Long Noncoding RNAs , 2006, PLoS genetics.

[9]  Paulo P. Amaral,et al.  Androgen responsive intronic non-coding RNAs , 2007, BMC Biology.

[10]  F. Sanger,et al.  Nucleotide sequence of bacteriophage phi X174 DNA. , 1977, Nature.

[11]  Paulo P. Amaral,et al.  The Reality of Pervasive Transcription , 2011, PLoS biology.

[12]  Peter F. Stadler,et al.  RNAz 2.0: Improved Noncoding RNA Detection , 2010, Pacific Symposium on Biocomputing.

[13]  Ivo L. Hofacker,et al.  The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures , 2007, Nucleic Acids Res..

[14]  E. Devor,et al.  Molecular and Temporal Characteristics of Human Retropseudogenes , 2003, Human biology.

[15]  C. Cepko,et al.  The Noncoding RNA Taurine Upregulated Gene 1 Is Required for Differentiation of the Murine Retina , 2005, Current Biology.

[16]  Saba Valadkhan,et al.  Computational analysis of functional long noncoding RNAs reveals lack of peptide-coding capacity and parallels with 3' UTRs. , 2012, RNA.

[17]  Tim R. Mercer,et al.  Expression of distinct RNAs from 3′ untranslated regions , 2010, Nucleic acids research.

[18]  Rolf Backofen,et al.  Conserved introns reveal novel transcripts in Drosophila melanogaster. , 2009, Genome research.

[19]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[20]  P. Sorensen,et al.  The majority of total nuclear-encoded non-ribosomal RNA in a human cell is 'dark matter' un-annotated RNA , 2010, BMC Biology.

[21]  A. N. Spiridonov,et al.  Distinct Patterns of Expression and Evolution of Intronless and Intron-Containing Mammalian Genes , 2010, Molecular biology and evolution.

[22]  Francesca Chiaromonte,et al.  Scoring Pairwise Genomic Sequence Alignments , 2001, Pacific Symposium on Biocomputing.

[23]  P. Stadler,et al.  RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription , 2007, Science.

[24]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[25]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[26]  Ed Hurt,et al.  Exporting RNA from the nucleus to the cytoplasm , 2007, Nature Reviews Molecular Cell Biology.

[27]  J. Mattick,et al.  Regulated post-transcriptional RNA cleavage diversifies the eukaryotic transcriptome. , 2010, Genome research.

[28]  Sergio Verjovski-Almeida,et al.  Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription , 2007, Genome Biology.

[29]  A. Rebaï,et al.  IGD: a resource for intronless genes in the human genome. , 2011, Gene.

[30]  F. Pauler,et al.  Silencing and transcriptional properties of the imprinted Airn ncRNA are independent of the endogenous promoter , 2008, The EMBO journal.

[31]  M. Gerstein,et al.  What is a gene, post-ENCODE? History and updated definition. , 2007, Genome research.

[32]  P. Stadler Evolution of the long non-coding RNAs MALAT 1 and MEN β / ǫ , 2010 .

[33]  Carolyn J. Brown,et al.  A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome , 1991, Nature.

[34]  S. Verjovski-Almeida,et al.  Expression analysis and in silico characterization of intronic long noncoding RNAs in renal cell carcinoma: emerging functional associations , 2013, Molecular Cancer.

[35]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[36]  Gaurav Kumar Pandey,et al.  Regulation of the mammalian epigenome by long noncoding RNAs. , 2009, Biochimica et biophysica acta.

[37]  Thomas Lengauer,et al.  BLUEPRINT to decode the epigenetic signature written in blood , 2012, Nature Biotechnology.

[38]  R. Reed,et al.  Export and stability of naturally intronless mRNAs require specific coding region sequences and the TREX mRNA export complex , 2011, Proceedings of the National Academy of Sciences.

[39]  Piero Carninci,et al.  Tag-based approaches for transcriptome research and genome annotation , 2005, Nature Methods.

[40]  Howard Y. Chang,et al.  Molecular mechanisms of long noncoding RNAs. , 2011, Molecular cell.

[41]  C. Ponting,et al.  Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes , 2010, Genome Biology.

[42]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[43]  C. Gustafsson,et al.  Mitochondrial transcription and its regulation in mammalian cells. , 2007, Trends in biochemical sciences.

[44]  Andrew M. Waterhouse,et al.  The FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation , 2009, Genome Biology.

[45]  John N. Hutchinson,et al.  A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains , 2007, BMC Genomics.

[46]  F. Azorín,et al.  RNA Is an Integral Component of Chromatin that Contributes to Its Structural Organization , 2007, PloS one.

[47]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[48]  G. Barton,et al.  Improved Annotation of 3′ Untranslated Regions and Complex Loci by Combination of Strand-Specific Direct RNA Sequencing, RNA-Seq and ESTs , 2013, PloS one.

[49]  J. Hawkins A survey on intron and exon lengths. , 1988, Nucleic acids research.

[50]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[51]  John S. Mattick,et al.  lncRNAdb: a reference database for long noncoding RNAs , 2010, Nucleic Acids Res..

[52]  D. Spector,et al.  Direct Visualization of the Co-transcriptional Assembly of a Nuclear Body by Noncoding RNAs , 2010, Nature Cell Biology.

[53]  P. Newburger,et al.  HOX antisense lincRNA HOXA‐AS2 is an apoptosis repressor in all Trans retinoic acid treated NB4 promyelocytic leukemia cells , 2013, Journal of cellular biochemistry.

[54]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[55]  N. Brockdorff,et al.  A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements , 2008, PloS one.

[56]  E. Wagner,et al.  Metabolism and regulation of canonical histone mRNAs: life without a poly(A) tail , 2008, Nature Reviews Genetics.

[57]  Yves Moreau,et al.  Detection of novel 3' untranslated region extensions with 3' expression microarrays , 2010, BMC Genomics.

[58]  S. Freier,et al.  Control of RNA processing by a large non‐coding RNA over‐expressed in carcinomas , 2011, FEBS letters.

[59]  Ting Wang,et al.  Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser , 2013, Bioinform..

[60]  J. Hawkins,et al.  A survey on intron and exon lengths. , 1988, Nucleic acids research.

[61]  J. Rinn,et al.  The transcriptional activity of human Chromosome 22. , 2003, Genes & development.

[62]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Peter Beighton,et al.  de la Chapelle, A. , 1997 .

[64]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[65]  W. Reik,et al.  The long noncoding RNA Kcnq1ot1 organises a lineage-specific nuclear domain for epigenetic gene silencing , 2009, Development.

[66]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[67]  C. Futter,et al.  EGF stimulates annexin 1‐dependent inward vesiculation in a multivesicular endosome subpopulation , 2006, The EMBO journal.

[68]  辻 淳子 Immigrants to the Nucleus : Analysis of Mitochondrially Derived Nuclear Genomic Regions (NUMT) , 2010 .

[69]  Carsten O. Daub,et al.  Update of the FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation , 2010, Nucleic Acids Res..

[70]  Jun Kawai,et al.  CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis , 2005, Nucleic Acids Res..

[71]  M Nirenberg,et al.  RNA codewords and protein synthesis, VII. On the general nature of the RNA code. , 1965, Proceedings of the National Academy of Sciences of the United States of America.

[72]  C. Glass,et al.  Induced ncRNAs Allosterically Modify RNA Binding Proteins in cis to Inhibit Transcription , 2008, Nature.

[73]  H. Nakaya,et al.  Conserved tissue expression signatures of intronic noncoding RNAs transcribed from human and mouse loci. , 2008, Genomics.

[74]  M. Frith,et al.  Mammalian NUMT insertion is non-random , 2012, Nucleic acids research.

[75]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[76]  Peter F. Stadler,et al.  Evolution of the Long Non-coding RNAs MALAT1 and MENbeta/epsilon , 2010, BSB.

[77]  W. Martin,et al.  Molecular Poltergeists: Mitochondrial DNA Copies (numts) in Sequenced Nuclear Genomes , 2010, PLoS genetics.

[78]  Paulo P. Amaral,et al.  MEN epsilon/beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles. , 2009, Genome research.

[79]  T. Mituyama,et al.  MENε/β noncoding RNAs are essential for structural integrity of nuclear paraspeckles , 2009, Proceedings of the National Academy of Sciences.

[80]  Gaurav Kumar Pandey,et al.  Characterization of the RNA content of chromatin. , 2010, Genome research.

[81]  Sergio Verjovski-Almeida,et al.  Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer , 2004, Oncogene.

[82]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[83]  E. Wagner,et al.  Identification of the human homolog of the imprinted mouse Air non-coding RNA. , 2008, Genomics.

[84]  D. Barlow,et al.  The imprinted Air ncRNA is an atypical RNAPII transcript that evades splicing and escapes nuclear export , 2006, The EMBO journal.

[85]  N. Visa,et al.  Splice-Site Mutations Cause Rrp6-Mediated Nuclear Retention of the Unspliced RNAs and Transcriptional Down-Regulation of the Splicing-Defective Genes , 2010, PloS one.

[86]  Yusuke Nakamura,et al.  Association of a novel long non‐coding RNA in 8q24 with prostate cancer susceptibility , 2011, Cancer science.

[87]  Sergio Verjovski-Almeida,et al.  Long noncoding intronic RNAs are differentially expressed in primary and metastatic pancreatic cancer , 2011, Molecular Cancer.

[88]  David E. Gloriam,et al.  Critical evaluation of the FANTOM3 non-coding RNA transcripts. , 2009, Genomics.

[89]  M. Schott A Susceptibility Locus for Papillary Thyroid Carcinoma on Chromosome 8q24 , 2009 .

[90]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[91]  Carlos Eduardo Ferreira,et al.  Advances in Bioinformatics and Computational Biology, 5th Brazilian Symposium on Bioinformatics, BSB 2010, Rio de Janeiro, Brazil, August 31-September 3, 2010. Proceedings , 2010, BSB.

[92]  Sergio Verjovski-Almeida,et al.  Long intronic noncoding RNA transcription: expression noise or expression choice? , 2009, Genomics.

[93]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[94]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[95]  James G. R. Gilbert,et al.  Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project , 2008, Immunogenetics.

[96]  Paul M. Harrison,et al.  Analysis of the role of retrotransposition in gene evolution in vertebrates , 2007, BMC Bioinformatics.

[97]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.