RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

doi:10.1186/1471-2105-12-323

. 2011 Aug 4:12:323.

doi: 10.1186/1471-2105-12-323.

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li¹, Colin N Dewey

Affiliations

PMID: 21816040
PMCID: PMC3163565
DOI: 10.1186/1471-2105-12-323

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li et al. BMC Bioinformatics. 2011.

. 2011 Aug 4:12:323.

doi: 10.1186/1471-2105-12-323.

Authors

Bo Li¹, Colin N Dewey

Affiliation

¹ Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA.

PMID: 21816040
PMCID: PMC3163565
DOI: 10.1186/1471-2105-12-323

Abstract

Background: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.

Results: We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.

Conclusions: RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

PubMed Disclaimer

Figures

**Figure 1**
**The RSEM software workflow**. The standard RSEM workflow (indicated by the solid arrows) consists of running just two programs (rsem-prepare-reference and rsem-calculate-expression), which automate the use of Bowtie for read alignment. Workflows with an alternative alignment program additionally use the steps connected by the dashed arrows. Two additional programs, rsem-bam2wig and rsem-plot-model, allow for visualizing the output of RSEM. RNA-Seq data can also be simulated with RSEM via the workflow indicated by the dotted arrows.

**Figure 2**
**RSEM visualizations in the UCSC Genome Browser**. Example visualizations of RSEM output from mouse RNA-Seq data set SRR065546 in the UCSC Genome Browser. (A) Simultaneous visualization of the wiggle output, which gives the expected read depth at each position in the genome, and the BAM output, which gives probabilistically-weighted read alignments. In the BAM track, paired reads are connected by a thin black line and the darkness of the read indicates the posterior probability of its alignment (black meaning high probability). (B) An example gene for which the expected read depth (top track) differs greatly from the read depth computed from uniquely-mapping reads only (bottom track).

**Figure 3**
**Accuracy of four RNA-Seq quantification methods**. The percent error distributions of estimates from RSEM, IsoEM, Cufflinks, and rQuant on simulated RNA-Seq data. The error distributions of global isoform and gene estimates from PE data are shown in (A) and (B), respectively. Global isoform and gene estimate error distributions for SE data are shown in (C) and (D), respectively.

**Figure 4**
**The directed graphical model used by RSEM**. The model consists of N sets of random variables, one per sequenced RNA-Seq fragment. For fragment n, its parent transcript, length, start position, and orientation are represented by the latent variables *G_n*, *F_n*, *S_n*and *O_n*respectively. For PE data, the observed variables (shaded circles), are the read lengths ( and ), quality scores ( and ), and sequences ( and ). For SE data, , , and are unobserved. The primary parameters of the model are given by the vector θ, which represents the prior probabilities of a fragment being derived from each transcript.

formula image — **Figure 4**
**The directed graphical model used by RSEM**. The model consists of N sets of random variables, one per sequenced RNA-Seq fragment. For fragment n, its parent transcript, length, start position, and orientation are represented by the latent variables *G_n*, *F_n*, *S_n*and *O_n*respectively. For PE data, the observed variables (shaded circles), are the read lengths ( and ), quality scores ( and ), and sequences ( and ). For SE data, , , and are unobserved. The primary parameters of the model are given by the vector θ, which represents the prior probabilities of a fragment being derived from each transcript.

See this image and copyright information in PMC

Cited by

Fine mapping of a major QTL, qECQ8, for rice taste quality.
Zhu S, Tang G, Yang Z, Han R, Deng W, Shen X, Huang R. Zhu S, et al. BMC Plant Biol. 2024 Oct 31;24(1):1034. doi: 10.1186/s12870-024-05744-8. BMC Plant Biol. 2024. PMID: 39478453 Free PMC article.
Multiomic analysis of familial adenomatous polyposis reveals molecular pathways associated with early tumorigenesis.
Esplin ED, Hanson C, Wu S, Horning AM, Barapour N, Nevins SA, Jiang L, Contrepois K, Lee H, Guha TK, Hu Z, Laquindanum R, Mills MA, Chaib H, Chiu R, Jian R, Chan J, Ellenberger M, Becker WR, Bahmani B, Khan A, Michael B, Weimer AK, Esplin DG, Shen J, Lancaster S, Monte E, Karathanos TV, Ladabaum U, Longacre TA, Kundaje A, Curtis C, Greenleaf WJ, Ford JM, Snyder MP. Esplin ED, et al. Nat Cancer. 2024 Oct 30. doi: 10.1038/s43018-024-00831-z. Online ahead of print. Nat Cancer. 2024. PMID: 39478120
Re-programming by a six-factor-secretome in the patient tumor ecosystem during nutrient stress and drug response.
Elghetany MT, Pan JL, Sekar K, Major A, Mf Su J, Adesina A, Hui KM, Li XN, Teo WY. Elghetany MT, et al. iScience. 2024 Sep 11;27(10):110932. doi: 10.1016/j.isci.2024.110932. eCollection 2024 Oct 18. iScience. 2024. PMID: 39474075 Free PMC article.
Integrative analysis of gene expression, protein abundance, and metabolomic profiling elucidates complex relationships in chronic hyperglycemia-induced changes in human aortic smooth muscle cells.
Bohara S, Bagheri A, Ertugral EG, Radzikh I, Sandlers Y, Jiang P, Kothapalli CR. Bohara S, et al. J Biol Eng. 2024 Oct 29;18(1):61. doi: 10.1186/s13036-024-00457-w. J Biol Eng. 2024. PMID: 39473010 Free PMC article.
Meloidogyne incognita genes involved in the repellent behavior in response to ascr#9.
Rao Z, Dai K, Han R, Xu C, Cao L. Rao Z, et al. Sci Rep. 2024 Oct 28;14(1):25706. doi: 10.1038/s41598-024-76370-5. Sci Rep. 2024. PMID: 39465253 Free PMC article.

See all "Cited by" articles

References

1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009;10:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
1. Bohnert R, Rätsch G. rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic Acids Research. 2010. pp. W348–51. - PMC - PubMed
1. Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods. 2010;7(12):1009–15. doi: 10.1038/nmeth.1528. - DOI - PMC - PubMed
1. Nicolae M, Mangul S, Măndoiu I, Zelikovsky A. In: Algorithms in Bioinformatics, Lecture Notes in Computer Science. Moulton V, Singh M, editor. Liverpool, UK: Springer Berlin/Heidelberg; 2010. Estimation of alternative splicing isoform frequencies from RNA-Seq data; pp. 202–214.
1. Jiang H, Wong WH. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009;25(8):1026–1032. doi: 10.1093/bioinformatics/btp113. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009;10:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed

[2] Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009;10:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed

[3] Bohnert R, Rätsch G. rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic Acids Research. 2010. pp. W348–51. - PMC - PubMed

[4] Bohnert R, Rätsch G. rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic Acids Research. 2010. pp. W348–51. - PMC - PubMed

[5] Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods. 2010;7(12):1009–15. doi: 10.1038/nmeth.1528. - DOI - PMC - PubMed

[6] Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods. 2010;7(12):1009–15. doi: 10.1038/nmeth.1528. - DOI - PMC - PubMed

[7] Nicolae M, Mangul S, Măndoiu I, Zelikovsky A. In: Algorithms in Bioinformatics, Lecture Notes in Computer Science. Moulton V, Singh M, editor. Liverpool, UK: Springer Berlin/Heidelberg; 2010. Estimation of alternative splicing isoform frequencies from RNA-Seq data; pp. 202–214.

[8] Nicolae M, Mangul S, Măndoiu I, Zelikovsky A. In: Algorithms in Bioinformatics, Lecture Notes in Computer Science. Moulton V, Singh M, editor. Liverpool, UK: Springer Berlin/Heidelberg; 2010. Estimation of alternative splicing isoform frequencies from RNA-Seq data; pp. 202–214.

[9] Jiang H, Wong WH. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009;25(8):1026–1032. doi: 10.1093/bioinformatics/btp113. - DOI - PMC - PubMed

[10] Jiang H, Wong WH. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009;25(8):1026–1032. doi: 10.1093/bioinformatics/btp113. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Affiliation

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials