- Split View
-
Views
-
Cite
Cite
Toutai Mituyama, Kouichirou Yamada, Emi Hattori, Hiroaki Okida, Yukiteru Ono, Goro Terai, Aya Yoshizawa, Takashi Komori, Kiyoshi Asai, The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs, Nucleic Acids Research, Volume 37, Issue suppl_1, 1 January 2009, Pages D89–D92, https://doi.org/10.1093/nar/gkn805
- Share Icon Share
Abstract
We developed a pair of databases that support two important tasks: annotation of anonymous RNA transcripts and discovery of novel non-coding RNAs. The database combo is called the Functional RNA Database and consists of two databases: a rewrite of the original version of the Functional RNA Database (fRNAdb) and the latest version of the UCSC GenomeBrowser for Functional RNA. The former is a sequence database equipped with a powerful search function and hosts a large collection of known/predicted non-coding RNA sequences acquired from existing databases as well as novel/predicted sequences reported by researchers of the Functional RNA Project. The latter is a UCSC Genome Browser mirror with large additional custom tracks specifically associated with non-coding elements. It also includes several functional enhancements such as a presentation of a common secondary structure prediction at any given genomic window ⩽500 bp. Our GenomeBrowser supports user authentication and user-specific tracks. The current version of the fRNAdb is a complete rewrite of the former version, hosting a larger number of sequences and with a much friendlier interface. The current version of UCSC GenomeBrowser for Functional RNA features a larger number of tracks and richer features than the former version. The databases are available at http://www.ncrna.org/.
INTRODUCTION
Large-scale transcription analyses such as the H-invitational (1) and Fantom (2) projects reported a large number of transcripts that could not be associated with coding genes, and which were thus left unclassifiable. Several investigations revealed that these unclassifiable transcripts contain novel non-coding genes (3–5). The Functional RNA Database (fRNAdb) 1.0 (6) focused on acquiring and providing lines of evidence to infer non-coding-ness for these unclassifiable transcripts to help filter out candidates for non-coding genes. However, drastic changes in the situation surrounding non-coding RNA research spurred us to move on to the next phase of database development. A transcriptome analysis for natural RNA transcripts utilizing high-throughput sequencing is one of the most attractive topics among recent research activities. Due to the abundance of sequence data produced by deep sequencing, computational analysis plays an important role in the rapid sequence mapping and annotation of anonymous sequences. In particular, a sequence database is the most crucial part of computational analysis. Total RNAs extracted from a cell tend to have diverse compositions even though RNAs are extracted via immunoprecipitation of specific proteins (7–9). They contain tRNAs, rRNAs, coding mRNAs, varieties of transposons and non-coding RNAs including miRNAs and snoRNAs together with a fair amount of anonymous transcripts meeting no existing annotations although they can be mapped to a genome. Such transcripts may contain evidence of novel non-coding RNA genes. In order to adopt the large-scale sequence data from deep sequencing, we have completely redesigned and rebuilt fRNAdb. The major changes include increase of hosting sequences (from 13 693 to 509 795), sequence ontology (SO, http://song.sourceforge.net/) classification, keyword search function and Blast search service. The details given in the next section are new features for the current version.
fRNAdb
fRNAdb is a sequence database hosting a large collection of non-coding RNA sequence data from public non-coding databases: H-invDB rel. 5.0 (1), FANTOM3 (2), miRBase 10.0 (10), NONCODE v1.0 (11), Rfam v8.1 (12), RNAdb v2.0 (13) and snoRNA-LBME-db rel. 3 (14). Although these databases contain many identical sequences, fRNAdb consolidates them to a set of unique sequences. Therefore, one fRNAdb sequence can have multiple accessions and multiple source organisms.
A sequence can have one or more mapping loci in multiple genomes, gene association using mapping information, sequence similarity information between other registered sequences, and reference information. All sequences are mapped to multiple genomes (humans, mice, rats and fruit flies) in order to determine potential loci and potential homologs. The mapping loci can be viewed in our UCSC GenomeBrowser for Functional RNA for visual inspection with a number of tracks showing versatile genomic elements provided by the original UCSC Genome Browser and our additional tracks detailed in the next section.
fRNAdb allows users to search the sequences through keywords associated with them. Various kinds of information are associated with a sequence, as shown in Figure 1. The keywords are extracted from an identifier, description text, accession, SO, source organism, cross reference information, associated gene names, title/abstract/author text of reference papers, genome/chromosome/cytoband and sequence length. Common English words that may hinder efficient keyword search are eliminated from the index using the English dictionary of the open source spell checker aspell (http://aspell.net/).
Statistics of keywords associated with fRNAdb sequences can be browsed at the fRNAdb::Statistics page, where frequently used keywords corresponding to canonical terms in various ontology sets are presented. These statistics are useful for providing an overview of the entire non-coding RNA sequences from multiple aspects using different ontologies such as SO, taxonomy and several ontologies of the Open Biomedical Ontologies (http://www.obofoundry.org/): human disease ontology and gene ontology (biological/molecular processes).
fRNAdb also provides sequence homology search using Blastn (15). In order to provide better usability, we divided our database in two parts: one contains sequences longer than 50 bases and the other contains sequences 50 bases or shorter since some users are not interested in small sequences that include a large number of deep sequencing products. fRNAdb::Blast automatically adjusts some parameters according to the length of a query sequence in order to improve performance for short (<50 bases) query sequences. The adaptive parameters are gap opening/extension cost, E-value, and word size. All Blast parameters can be overridden by users. More details about fRNAdb are provided on the fRNAdb::Help page.
UCSC GENOME BROWSER FOR FUNCTIONAL RNA
This database is an extended mirror of the UCSC Genome Browser (16) hosting genomes of humans (hg17 and hg18), mice (mm9), rats (rn3) and fruit flies (dm3). This database has been updated extensively. There were 15 original tracks in the previous version (6). We re-organized our tracks and added more custom tracks. For hg18, our extension includes 26 essential tracks for the ncRNA Prediction and Mapping Tracks group, five essential tracks for the Misc. Genomic Element Tracks group, and five essential tracks for the miRNA-related Tracks group. Tracks for the whole human tiling array of Affymetrix Transfrags (17) are available (currently supported only on hg17).
We have developed several tracks to support an improved presentation. For example, the miRNA Atlas (18) track has a feature to present the expression profile of multiple miRNAs residing inside the GenomeBrowser window (Figure 2). Another example is tissue-specific enhancers and the target loci (19) track. This track indicates an enhancer region with an orange box and its associated gene locus with a green bar, which is rendered in darker green when the locus is activated in more tissues. Yet another extension is given to the conservation track, which shows not only a multiple genome alignment but also predicted common RNA secondary structures. When clicking on the conservation track in the window showing a genomic region ⩽500 bp, prediction is dynamically perfomed in both strands. Then, the browser presents a predicted secondary structure, minimum free energy and the number of base pairs per strand. The estimated secondary structure is downloadable as PDF graphics and in Stockholm format, which is a secondary structure annotated alignment file. This file can be used for determining homologous secondary structure in a database using Infernal software package (http://infernal.janelia.org). Complete listing and details of extension tracks are found in the Project Specific Custom Tracks page (http://www.ncrna.org/custom-tracks).
FUNDING
This work was supported by the Functional RNA Project funded by New Energy and Industrial Technology Development Organization (NEDO). Funding for open access charge: Japan Biological Informatics Consortium (JBIC).
Conflict of interest statement. None declared.
ACKNOWLEDGEMENTS
The authors thank everyone in the bioinformatics group of the Functional RNA Project for constructive criticisms and fruitful discussions.
Comments