Abstract

We developed a pair of databases that support two important tasks: annotation of anonymous RNA transcripts and discovery of novel non-coding RNAs. The database combo is called the Functional RNA Database and consists of two databases: a rewrite of the original version of the Functional RNA Database (fRNAdb) and the latest version of the UCSC GenomeBrowser for Functional RNA. The former is a sequence database equipped with a powerful search function and hosts a large collection of known/predicted non-coding RNA sequences acquired from existing databases as well as novel/predicted sequences reported by researchers of the Functional RNA Project. The latter is a UCSC Genome Browser mirror with large additional custom tracks specifically associated with non-coding elements. It also includes several functional enhancements such as a presentation of a common secondary structure prediction at any given genomic window 500 bp. Our GenomeBrowser supports user authentication and user-specific tracks. The current version of the fRNAdb is a complete rewrite of the former version, hosting a larger number of sequences and with a much friendlier interface. The current version of UCSC GenomeBrowser for Functional RNA features a larger number of tracks and richer features than the former version. The databases are available at http://www.ncrna.org/.

INTRODUCTION

Large-scale transcription analyses such as the H-invitational (1) and Fantom (2) projects reported a large number of transcripts that could not be associated with coding genes, and which were thus left unclassifiable. Several investigations revealed that these unclassifiable transcripts contain novel non-coding genes (3–5). The Functional RNA Database (fRNAdb) 1.0 (6) focused on acquiring and providing lines of evidence to infer non-coding-ness for these unclassifiable transcripts to help filter out candidates for non-coding genes. However, drastic changes in the situation surrounding non-coding RNA research spurred us to move on to the next phase of database development. A transcriptome analysis for natural RNA transcripts utilizing high-throughput sequencing is one of the most attractive topics among recent research activities. Due to the abundance of sequence data produced by deep sequencing, computational analysis plays an important role in the rapid sequence mapping and annotation of anonymous sequences. In particular, a sequence database is the most crucial part of computational analysis. Total RNAs extracted from a cell tend to have diverse compositions even though RNAs are extracted via immunoprecipitation of specific proteins (7–9). They contain tRNAs, rRNAs, coding mRNAs, varieties of transposons and non-coding RNAs including miRNAs and snoRNAs together with a fair amount of anonymous transcripts meeting no existing annotations although they can be mapped to a genome. Such transcripts may contain evidence of novel non-coding RNA genes. In order to adopt the large-scale sequence data from deep sequencing, we have completely redesigned and rebuilt fRNAdb. The major changes include increase of hosting sequences (from 13 693 to 509 795), sequence ontology (SO, http://song.sourceforge.net/) classification, keyword search function and Blast search service. The details given in the next section are new features for the current version.

fRNAdb

fRNAdb is a sequence database hosting a large collection of non-coding RNA sequence data from public non-coding databases: H-invDB rel. 5.0 (1), FANTOM3 (2), miRBase 10.0 (10), NONCODE v1.0 (11), Rfam v8.1 (12), RNAdb v2.0 (13) and snoRNA-LBME-db rel. 3 (14). Although these databases contain many identical sequences, fRNAdb consolidates them to a set of unique sequences. Therefore, one fRNAdb sequence can have multiple accessions and multiple source organisms.

A sequence can have one or more mapping loci in multiple genomes, gene association using mapping information, sequence similarity information between other registered sequences, and reference information. All sequences are mapped to multiple genomes (humans, mice, rats and fruit flies) in order to determine potential loci and potential homologs. The mapping loci can be viewed in our UCSC GenomeBrowser for Functional RNA for visual inspection with a number of tracks showing versatile genomic elements provided by the original UCSC Genome Browser and our additional tracks detailed in the next section.

fRNAdb allows users to search the sequences through keywords associated with them. Various kinds of information are associated with a sequence, as shown in Figure 1. The keywords are extracted from an identifier, description text, accession, SO, source organism, cross reference information, associated gene names, title/abstract/author text of reference papers, genome/chromosome/cytoband and sequence length. Common English words that may hinder efficient keyword search are eliminated from the index using the English dictionary of the open source spell checker aspell (http://aspell.net/).

Diagram showing a registered sequence and its associations to other information.
Figure 1.

Diagram showing a registered sequence and its associations to other information.

Statistics of keywords associated with fRNAdb sequences can be browsed at the fRNAdb::Statistics page, where frequently used keywords corresponding to canonical terms in various ontology sets are presented. These statistics are useful for providing an overview of the entire non-coding RNA sequences from multiple aspects using different ontologies such as SO, taxonomy and several ontologies of the Open Biomedical Ontologies (http://www.obofoundry.org/): human disease ontology and gene ontology (biological/molecular processes).

fRNAdb also provides sequence homology search using Blastn (15). In order to provide better usability, we divided our database in two parts: one contains sequences longer than 50 bases and the other contains sequences 50 bases or shorter since some users are not interested in small sequences that include a large number of deep sequencing products. fRNAdb::Blast automatically adjusts some parameters according to the length of a query sequence in order to improve performance for short (<50 bases) query sequences. The adaptive parameters are gap opening/extension cost, E-value, and word size. All Blast parameters can be overridden by users. More details about fRNAdb are provided on the fRNAdb::Help page.

UCSC GENOME BROWSER FOR FUNCTIONAL RNA

This database is an extended mirror of the UCSC Genome Browser (16) hosting genomes of humans (hg17 and hg18), mice (mm9), rats (rn3) and fruit flies (dm3). This database has been updated extensively. There were 15 original tracks in the previous version (6). We re-organized our tracks and added more custom tracks. For hg18, our extension includes 26 essential tracks for the ncRNA Prediction and Mapping Tracks group, five essential tracks for the Misc. Genomic Element Tracks group, and five essential tracks for the miRNA-related Tracks group. Tracks for the whole human tiling array of Affymetrix Transfrags (17) are available (currently supported only on hg17).

We have developed several tracks to support an improved presentation. For example, the miRNA Atlas (18) track has a feature to present the expression profile of multiple miRNAs residing inside the GenomeBrowser window (Figure 2). Another example is tissue-specific enhancers and the target loci (19) track. This track indicates an enhancer region with an orange box and its associated gene locus with a green bar, which is rendered in darker green when the locus is activated in more tissues. Yet another extension is given to the conservation track, which shows not only a multiple genome alignment but also predicted common RNA secondary structures. When clicking on the conservation track in the window showing a genomic region ⩽500 bp, prediction is dynamically perfomed in both strands. Then, the browser presents a predicted secondary structure, minimum free energy and the number of base pairs per strand. The estimated secondary structure is downloadable as PDF graphics and in Stockholm format, which is a secondary structure annotated alignment file. This file can be used for determining homologous secondary structure in a database using Infernal software package (http://infernal.janelia.org). Complete listing and details of extension tracks are found in the Project Specific Custom Tracks page (http://www.ncrna.org/custom-tracks).

Mammalian miRNA Expression Atlas track showing miR-302a/b/c/d highly expressed at 3p (A). The detailed page shows expression profiles for these miRNAs with a heat map and actual read numbers previously reported by (20) (B).
Figure 2.

Mammalian miRNA Expression Atlas track showing miR-302a/b/c/d highly expressed at 3p (A). The detailed page shows expression profiles for these miRNAs with a heat map and actual read numbers previously reported by (20) (B).

FUNDING

This work was supported by the Functional RNA Project funded by New Energy and Industrial Technology Development Organization (NEDO). Funding for open access charge: Japan Biological Informatics Consortium (JBIC).

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors thank everyone in the bioinformatics group of the Functional RNA Project for constructive criticisms and fruitful discussions.

REFERENCES

1
Imanishi
T
Itho
T
Suzuki
Y
O’Donovan
C
Fukuchi
S
Koyanagi
KO
Barrero
RA
Tamura
T
Yamaguchi-Kabata
Y
Tanino
M
et al. 
Integrative annotation of 21,037 human genes validated by full-length cDNA clones
PLoS Biol.
2004
, vol. 
2
 (pg. 
856
-
875
)
2
Carninci
P
Kasukawa
T
Katayama
S
Gough
J
Frith
MC
Maeda
N
Oyama
R
Ravasi
T
Lenhard
B
Wells
C
et al. 
The transcriptional landscape of the mammalian genome
Science
2005
, vol. 
309
 (pg. 
1559
-
1563
)
3
Inagaki
S
Numata
K
Kondo1
T
Tomita
M
Yasuda1
K
Kanai
A
Kageyama
Y
Identification and expression analysis of putative mRNA-like non-coding RNA in Drosophila
Genes Cell
2005
, vol. 
10
 (pg. 
1163
-
1173
)
4
Sasaki
YTF
Sano
M
Kin
T
Asai
K
Hirose
T
Coordinated expression of ncRNAs and HOX mRNAs in the human HOXA locus
Biochem. Biophys. Res. Comm.
2007
, vol. 
357
 (pg. 
724
-
730
)
5
Xue
C
Li
F
Li
F
Finding noncoding RNA transcripts from low abundance expressed sequence tags
Cell Res.
2008
, vol. 
18
 (pg. 
695
-
700
)
6
Kin
T
Yamada
K
Terai
G
Okida
H
Yoshinari
Y
Ono
Y
Kojima
A
Kimura
Y
Komori
T
Asai
K
fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences
Nucleic Acids Res.
2007
, vol. 
35
 (pg. 
D145
-
D148
)
7
Kawamura
Y
Saito
K
Kin
T
Ono
Y
Asai
K
Sunohara
T
Okada
TN
Siomi
MC
Siomi
H
Dropophila endogenous small RNAs bind to Argonaute 2 in somatic cells
Nature
2008
, vol. 
453
 (pg. 
793
-
797
)
8
Czech
B
Malone
CD
Zhou
R
Stark
A
Schlingeheyde
C
Dus
M
Perrimon
N
Kellis
M
Wohlschlegel
JA
Sachindanandam
R
et al. 
An endogenous small interfering RNA pathway in Drosophila
Nature
2008
, vol. 
453
 (pg. 
798
-
802
)
9
Okamura
K
Chung
WJ
Ruby
JG
Guo
H
Bartel
DP
Lai
EC
The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs
Nature
2008
, vol. 
453
 (pg. 
803
-
806
)
10
Griffiths-Jones
S
Saini
HK
van Dongen
S
Enright
AJ
miRBase: tools for microRNA genomics
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D154
-
D158
)
11
He
S
Liu
C
Skogerbø
G
Zhao
H
Wang
J
Liu
T
Bai
B
Zhao
Y
Chen
R
NONCODE v2.0: decoding the non-coding
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D170
-
D172
)
12
Griffiths-Jones
S
Moxon
S
Marshall
M
Khanna
A
Eddy
SR
Bateman
A
Rfam: annotating non-coding RNAs in complete genomes
Nucleic Acids Res.
2005
, vol. 
33
 (pg. 
D121
-
D124
)
13
Pang
KC
Stephen
S
Dinger
ME
Engström
PG
Lenhard
B
Mattick
JS
RNAdb 2.0—an expanded database of mammalian non-coding RNAs
Nucleic Acids Res.
2007
, vol. 
35
 (pg. 
D178
-
D182
)
14
Lestrade
L
Weber
MJ
snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs
Nucleic Acids Res.
2006
, vol. 
34
 (pg. 
D158
-
D162
)
15
Altschul
SF
Gish
W
Miller
W
Myers
EW
Lipman
DJ
Basic local alignment search tool
J. Mol. Biol.
1990
, vol. 
215
 (pg. 
403
-
410
)
16
Kuhn
RM
Karolchik
D
Zweig
AS
Trumbower
H
Thomas
DJ
Thakkapallayil
A
Sugnet
CW
Stanke
M
Smith
KE
Siepel
A
et al. 
The UCSC genome browser database: update 2007
Nucleic Acids Res.
2007
, vol. 
35
 (pg. 
D668
-
D673
)
17
Kapranov
P
Cheng
J
Dike
S
Nix
DA
Duttagupta
R
Willingham
AT
Stadler
PF
Hertel
J
Hackermüller
J
Hofacker
IL
et al. 
RNA maps reveal new RNA classes and a possible function for pervasive transcription
Science
2007
, vol. 
316
 (pg. 
1484
-
1488
)
18
Landgraf
P
Rusu
M
Sheridan
R
Sewer
A
Iovino
N
Aravin
A
Pfeffer
S
Rice
A
Kamphorst
AO
Landthaler
M
A mammalian microRNA expression atlas based on small RNA library sequencing
Cell
2007
, vol. 
129
 (pg. 
1401
-
1414
)
19
Pennacchio
LA
Loots
GG
Nobrega
MA
Ovcharenko
I
Predicting tissue-specific enhancers in the human genome
Genome Res.
2007
, vol. 
17
 (pg. 
201
-
211
)
20
Landgraf
P
Rusu
M
Sheridan
R
Sewer
A
Iovino
N
Aravin
A
Pfeffer
S
Rice
A
Kamphorst
AO
Landthaler
M
et al. 
A mammalian microRNA expression atlas based on small RNA library sequencing
Cell
2007
, vol. 
129
 (pg. 
1401
-
1414
)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.