TopHat: discovering splice junctions with RNA-Seq
- PMID: 19289445
- PMCID: PMC2672628
- DOI: 10.1093/bioinformatics/btp120
TopHat: discovering splice junctions with RNA-Seq
Abstract
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or 'reads', can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.
Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20,000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development.
Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
Figures
Similar articles
-
Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7. BMC Genomics. 2016. PMID: 27556805 Free PMC article.
-
MapSplice: accurate mapping of RNA-seq reads for splice junction discovery.Nucleic Acids Res. 2010 Oct;38(18):e178. doi: 10.1093/nar/gkq622. Epub 2010 Aug 27. Nucleic Acids Res. 2010. PMID: 20802226 Free PMC article.
-
Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19. Bioinformatics. 2011. PMID: 21775302 Free PMC article.
-
Mapping RNA-seq Reads with STAR.Curr Protoc Bioinformatics. 2015 Sep 3;51:11.14.1-11.14.19. doi: 10.1002/0471250953.bi1114s51. Curr Protoc Bioinformatics. 2015. PMID: 26334920 Free PMC article. Review.
-
Protocol for transcriptome assembly by the TransBorrow algorithm.Biol Methods Protoc. 2023 Nov 1;8(1):bpad028. doi: 10.1093/biomethods/bpad028. eCollection 2023. Biol Methods Protoc. 2023. PMID: 38023349 Free PMC article. Review.
Cited by
-
Impact of Polydeoxyribonucleotides on the Morphology, Viability, and Osteogenic Differentiation of Gingiva-Derived Stem Cell Spheroids.Medicina (Kaunas). 2024 Oct 1;60(10):1610. doi: 10.3390/medicina60101610. Medicina (Kaunas). 2024. PMID: 39459397 Free PMC article.
-
The genome of a wild Medicago species provides insights into the tolerant mechanisms of legume forage to environmental stress.BMC Biol. 2021 May 6;19(1):96. doi: 10.1186/s12915-021-01033-0. BMC Biol. 2021. PMID: 33957908 Free PMC article.
-
Vector transmission regulates immune control of Plasmodium virulence.Nature. 2013 Jun 13;498(7453):228-31. doi: 10.1038/nature12231. Epub 2013 May 29. Nature. 2013. PMID: 23719378 Free PMC article.
-
High-throughput RNA sequencing of a formalin-fixed, paraffin-embedded autopsy lung tissue sample from the 1918 influenza pandemic.J Pathol. 2013 Mar;229(4):535-45. doi: 10.1002/path.4145. J Pathol. 2013. PMID: 23180419 Free PMC article.
-
Genome-wide Profiling of RNA splicing in prostate tumor from RNA-seq data using virtual microarrays.J Clin Bioinforma. 2012 Nov 26;2(1):21. doi: 10.1186/2043-9113-2-21. J Clin Bioinforma. 2012. PMID: 23181285 Free PMC article.
References
-
- Abouelhoda M, et al. Replacing suffix trees with enhanced suffix arrays. J. Discrete Alg. 2004;2:53–86.
-
- Adams MD, et al. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet. 1993;4:373–380. - PubMed
-
- Burrows M, Wheeler D. Technical Report 124. Palo Alto, California: DEC, Digital Systems Research Center; 1994. A block sorting lossless data compression algorithm.
-
- Cloonan N, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Meth. 2008;5:613–619. - PubMed
-
- De Bona F, et al. Optimal spliced alignments of short sequence reads. Bioinformatics. 2008;24:i174–i180. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases