The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs

Abstract

We developed a pair of databases that support two important tasks: annotation of anonymous RNA transcripts and discovery of novel non-coding RNAs. The database combo is called the Functional RNA Database and consists of two databases: a rewrite of the original version of the Functional RNA Database (fRNAdb) and the latest version of the UCSC GenomeBrowser for Functional RNA. The former is a sequence database equipped with a powerful search function and hosts a large collection of known/predicted non-coding RNA sequences acquired from existing databases as well as novel/predicted sequences reported by researchers of the Functional RNA Project. The latter is a UCSC Genome Browser mirror with large additional custom tracks specifically associated with non-coding elements. It also includes several functional enhancements such as a presentation of a common secondary structure prediction at any given genomic window ⩽500 bp. Our GenomeBrowser supports user authentication and user-specific tracks. The current version of the fRNAdb is a complete rewrite of the former version, hosting a larger number of sequences and with a much friendlier interface. The current version of UCSC GenomeBrowser for Functional RNA features a larger number of tracks and richer features than the former version. The databases are available at http://www.ncrna.org/.

INTRODUCTION

Large-scale transcription analyses such as the H-invitational (1) and Fantom (2) projects reported a large number of transcripts that could not be associated with coding genes, and which were thus left unclassifiable. Several investigations revealed that these unclassifiable transcripts contain novel non-coding genes (3–5). The Functional RNA Database (fRNAdb) 1.0 (6) focused on acquiring and providing lines of evidence to infer non-coding-ness for these unclassifiable transcripts to help filter out candidates for non-coding genes. However, drastic changes in the situation surrounding non-coding RNA research spurred us to move on to the next phase of database development. A transcriptome analysis for natural RNA transcripts utilizing high-throughput sequencing is one of the most attractive topics among recent research activities. Due to the abundance of sequence data produced by deep sequencing, computational analysis plays an important role in the rapid sequence mapping and annotation of anonymous sequences. In particular, a sequence database is the most crucial part of computational analysis. Total RNAs extracted from a cell tend to have diverse compositions even though RNAs are extracted via immunoprecipitation of specific proteins (7–9). They contain tRNAs, rRNAs, coding mRNAs, varieties of transposons and non-coding RNAs including miRNAs and snoRNAs together with a fair amount of anonymous transcripts meeting no existing annotations although they can be mapped to a genome. Such transcripts may contain evidence of novel non-coding RNA genes. In order to adopt the large-scale sequence data from deep sequencing, we have completely redesigned and rebuilt fRNAdb. The major changes include increase of hosting sequences (from 13 693 to 509 795), sequence ontology (SO, http://song.sourceforge.net/) classification, keyword search function and Blast search service. The details given in the next section are new features for the current version.

fRNAdb

fRNAdb is a sequence database hosting a large collection of non-coding RNA sequence data from public non-coding databases: H-invDB rel. 5.0 (1), FANTOM3 (2), miRBase 10.0 (10), NONCODE v1.0 (11), Rfam v8.1 (12), RNAdb v2.0 (13) and snoRNA-LBME-db rel. 3 (14). Although these databases contain many identical sequences, fRNAdb consolidates them to a set of unique sequences. Therefore, one fRNAdb sequence can have multiple accessions and multiple source organisms.

A sequence can have one or more mapping loci in multiple genomes, gene association using mapping information, sequence similarity information between other registered sequences, and reference information. All sequences are mapped to multiple genomes (humans, mice, rats and fruit flies) in order to determine potential loci and potential homologs. The mapping loci can be viewed in our UCSC GenomeBrowser for Functional RNA for visual inspection with a number of tracks showing versatile genomic elements provided by the original UCSC Genome Browser and our additional tracks detailed in the next section.

fRNAdb allows users to search the sequences through keywords associated with them. Various kinds of information are associated with a sequence, as shown in Figure 1. The keywords are extracted from an identifier, description text, accession, SO, source organism, cross reference information, associated gene names, title/abstract/author text of reference papers, genome/chromosome/cytoband and sequence length. Common English words that may hinder efficient keyword search are eliminated from the index using the English dictionary of the open source spell checker aspell (http://aspell.net/).

Figure 1.

Diagram showing a registered sequence and its associations to other information.

Open in new tab Download slide

Statistics of keywords associated with fRNAdb sequences can be browsed at the fRNAdb::Statistics page, where frequently used keywords corresponding to canonical terms in various ontology sets are presented. These statistics are useful for providing an overview of the entire non-coding RNA sequences from multiple aspects using different ontologies such as SO, taxonomy and several ontologies of the Open Biomedical Ontologies (http://www.obofoundry.org/): human disease ontology and gene ontology (biological/molecular processes).

fRNAdb also provides sequence homology search using Blastn (15). In order to provide better usability, we divided our database in two parts: one contains sequences longer than 50 bases and the other contains sequences 50 bases or shorter since some users are not interested in small sequences that include a large number of deep sequencing products. fRNAdb::Blast automatically adjusts some parameters according to the length of a query sequence in order to improve performance for short (<50 bases) query sequences. The adaptive parameters are gap opening/extension cost, E-value, and word size. All Blast parameters can be overridden by users. More details about fRNAdb are provided on the fRNAdb::Help page.

UCSC GENOME BROWSER FOR FUNCTIONAL RNA

This database is an extended mirror of the UCSC Genome Browser (16) hosting genomes of humans (hg17 and hg18), mice (mm9), rats (rn3) and fruit flies (dm3). This database has been updated extensively. There were 15 original tracks in the previous version (6). We re-organized our tracks and added more custom tracks. For hg18, our extension includes 26 essential tracks for the ncRNA Prediction and Mapping Tracks group, five essential tracks for the Misc. Genomic Element Tracks group, and five essential tracks for the miRNA-related Tracks group. Tracks for the whole human tiling array of Affymetrix Transfrags (17) are available (currently supported only on hg17).

We have developed several tracks to support an improved presentation. For example, the miRNA Atlas (18) track has a feature to present the expression profile of multiple miRNAs residing inside the GenomeBrowser window (Figure 2). Another example is tissue-specific enhancers and the target loci (19) track. This track indicates an enhancer region with an orange box and its associated gene locus with a green bar, which is rendered in darker green when the locus is activated in more tissues. Yet another extension is given to the conservation track, which shows not only a multiple genome alignment but also predicted common RNA secondary structures. When clicking on the conservation track in the window showing a genomic region ⩽500 bp, prediction is dynamically perfomed in both strands. Then, the browser presents a predicted secondary structure, minimum free energy and the number of base pairs per strand. The estimated secondary structure is downloadable as PDF graphics and in Stockholm format, which is a secondary structure annotated alignment file. This file can be used for determining homologous secondary structure in a database using Infernal software package (http://infernal.janelia.org). Complete listing and details of extension tracks are found in the Project Specific Custom Tracks page (http://www.ncrna.org/custom-tracks).

Figure 2.

Mammalian miRNA Expression Atlas track showing miR-302a/b/c/d highly expressed at 3p (A). The detailed page shows expression profiles for these miRNAs with a heat map and actual read numbers previously reported by (20) (B).

Open in new tab Download slide

FUNDING

This work was supported by the Functional RNA Project funded by New Energy and Industrial Technology Development Organization (NEDO). Funding for open access charge: Japan Biological Informatics Consortium (JBIC).

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors thank everyone in the bioinformatics group of the Functional RNA Project for constructive criticisms and fruitful discussions.

REFERENCES

Imanishi

Itho

Suzuki

O’Donovan

Fukuchi

Koyanagi

Barrero

Tamura

Yamaguchi-Kabata

Tanino

et al.

Integrative annotation of 21,037 human genes validated by full-length cDNA clones

PLoS Biol.

2004

, vol.

(pg.

856

875

)

Google Scholar

Crossref

WorldCat

Carninci

Kasukawa

Katayama

Gough

Frith

Maeda

Oyama

Ravasi

Lenhard

Wells

et al.

The transcriptional landscape of the mammalian genome

Science

2005

, vol.

309

(pg.

1559

1563

)

Inagaki

Numata

Kondo1

Tomita

Yasuda1

Kanai

Kageyama

Identification and expression analysis of putative mRNA-like non-coding RNA in Drosophila

Genes Cell

2005

, vol.

(pg.

1163

1173

)

Google Scholar

Crossref

WorldCat

Sasaki

YTF

Sano

Kin

Asai

Hirose

Coordinated expression of ncRNAs and HOX mRNAs in the human HOXA locus

Biochem. Biophys. Res. Comm.

2007

, vol.

357

(pg.

724

730

)

Google Scholar

Crossref

WorldCat

Xue

Finding noncoding RNA transcripts from low abundance expressed sequence tags

Cell Res.

2008

, vol.

(pg.

695

700

)

Kin

Yamada

Terai

Okida

Yoshinari

Ono

Kojima

Kimura

Komori

Asai

fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences

Nucleic Acids Res.

2007

, vol.

(pg.

D145

D148

)

Kawamura

Saito

Kin

Ono

Asai

Sunohara

Okada

Siomi

Dropophila endogenous small RNAs bind to Argonaute 2 in somatic cells

Nature

2008

, vol.

453

(pg.

793

797

)

Czech

Malone

Zhou

Stark

Schlingeheyde

Dus

Perrimon

Kellis

Wohlschlegel

Sachindanandam

et al.

An endogenous small interfering RNA pathway in Drosophila

Nature

2008

, vol.

453

(pg.

798

802

)

Okamura

Chung

Ruby

Guo

Bartel

Lai

The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs

Nature

2008

, vol.

453

(pg.

803

806

)

Griffiths-Jones

Saini

van Dongen

Enright

miRBase: tools for microRNA genomics

Nucleic Acids Res.

2008

, vol.

(pg.

D154

D158

)

Liu

Skogerbø

Zhao

Wang

Liu

Bai

Zhao

Chen

NONCODE v2.0: decoding the non-coding

Nucleic Acids Res.

2008

, vol.

(pg.

D170

D172

)

Griffiths-Jones

Moxon

Marshall

Khanna

Eddy

Bateman

Rfam: annotating non-coding RNAs in complete genomes

Nucleic Acids Res.

2005

, vol.

(pg.

D121

D124

)

Pang

Stephen

Dinger

Engström

Lenhard

Mattick

RNAdb 2.0—an expanded database of mammalian non-coding RNAs

Nucleic Acids Res.

2007

, vol.

(pg.

D178

D182

)

Lestrade

Weber

snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs

Nucleic Acids Res.

2006

, vol.

(pg.

D158

D162

)

Altschul

Gish

Miller

Myers

Lipman

Basic local alignment search tool

J. Mol. Biol.

1990

, vol.

215

(pg.

403

410

)

Kuhn

Karolchik

Zweig

Trumbower

Thomas

Thakkapallayil

Sugnet

Stanke

Smith

Siepel

et al.

The UCSC genome browser database: update 2007

Nucleic Acids Res.

2007

, vol.

(pg.

D668

D673

)

Kapranov

Cheng

Dike

Nix

Duttagupta

Willingham

Stadler

Hertel

Hackermüller

Hofacker

et al.

RNA maps reveal new RNA classes and a possible function for pervasive transcription

Science

2007

, vol.

316

(pg.

1484

1488

)

Landgraf

Rusu

Sheridan

Sewer

Iovino

Aravin

Pfeffer

Rice

Kamphorst

Landthaler

A mammalian microRNA expression atlas based on small RNA library sequencing

Cell

2007

, vol.

129

(pg.

1401

1414

)

Pennacchio

Loots

Nobrega

Ovcharenko

Predicting tissue-specific enhancers in the human genome

Genome Res.

2007

, vol.

(pg.

201

211

)

Landgraf

Rusu

Sheridan

Sewer

Iovino

Aravin

Pfeffer

Rice

Kamphorst

Landthaler

et al.

A mammalian microRNA expression atlas based on small RNA library sequencing

Cell

2007

, vol.

129

(pg.

1401

1414

)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
December 2016	1
January 2017	3
February 2017	6
April 2017	6
May 2017	3
July 2017	4
August 2017	1
September 2017	1
October 2017	1
November 2017	2
December 2017	11
January 2018	14
February 2018	14
March 2018	23
April 2018	17
May 2018	19
June 2018	5
July 2018	8
August 2018	14
September 2018	10
October 2018	12
November 2018	24
December 2018	9
January 2019	20
February 2019	9
March 2019	30
April 2019	24
May 2019	22
June 2019	13
July 2019	10
August 2019	17
September 2019	31
October 2019	11
November 2019	13
December 2019	5
January 2020	15
February 2020	7
March 2020	15
April 2020	5
May 2020	9
June 2020	12
July 2020	7
August 2020	15
September 2020	18
October 2020	11
November 2020	12
December 2020	10
January 2021	13
February 2021	20
March 2021	20
April 2021	12
May 2021	11
June 2021	8
July 2021	8
August 2021	18
September 2021	8
October 2021	15
November 2021	33
December 2021	12
January 2022	8
February 2022	10
March 2022	14
April 2022	20
May 2022	11
June 2022	20
July 2022	17
August 2022	15
September 2022	18
October 2022	10
November 2022	15
December 2022	22
January 2023	9
February 2023	15
March 2023	6
April 2023	18
May 2023	6
June 2023	11
July 2023	2
August 2023	14
September 2023	14
October 2023	11
November 2023	8
December 2023	32
January 2024	33
February 2024	20
March 2024	12
April 2024	20
May 2024	16
June 2024	9
July 2024	18
August 2024	6
September 2024	10
October 2024	12

Article Contents

The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs

Abstract

INTRODUCTION

fRNAdb

UCSC GENOME BROWSER FOR FUNCTIONAL RNA

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs

Abstract

INTRODUCTION

fRNAdb

UCSC GENOME BROWSER FOR FUNCTIONAL RNA

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only