Dispersion estimation and its effect on test performance in RNA-seq data analysis: a simulation-based comparison of methods
- PMID: 24349066
- PMCID: PMC3857202
- DOI: 10.1371/journal.pone.0081415
Dispersion estimation and its effect on test performance in RNA-seq data analysis: a simulation-based comparison of methods
Abstract
A central goal of RNA sequencing (RNA-seq) experiments is to detect differentially expressed genes. In the ubiquitous negative binomial model for RNA-seq data, each gene is given a dispersion parameter, and correctly estimating these dispersion parameters is vital to detecting differential expression. Since the dispersions control the variances of the gene counts, underestimation may lead to false discovery, while overestimation may lower the rate of true detection. After briefly reviewing several popular dispersion estimation methods, this article describes a simulation study that compares them in terms of point estimation and the effect on the performance of tests for differential expression. The methods that maximize the test performance are the ones that use a moderate degree of dispersion shrinkage: the DSS, Tagwise wqCML, and Tagwise APL. In practical RNA-seq data analysis, we recommend using one of these moderate-shrinkage methods with the QLShrink test in QuasiSeq R package.
Conflict of interest statement
Figures
Similar articles
-
Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.Stat Appl Genet Mol Biol. 2012 Oct 22;11(5):/j/sagmb.2012.11.issue-5/1544-6115.1826/1544-6115.1826.xml. doi: 10.1515/1544-6115.1826. Stat Appl Genet Mol Biol. 2012. PMID: 23104842
-
Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model.Stat Appl Genet Mol Biol. 2019 Jan 22;18(1):/j/sagmb.2019.18.issue-1/sagmb-2018-0021/sagmb-2018-0021.xml. doi: 10.1515/sagmb-2018-0021. Stat Appl Genet Mol Biol. 2019. PMID: 30667368
-
RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.BMC Bioinformatics. 2018 May 30;19(1):191. doi: 10.1186/s12859-018-2191-5. BMC Bioinformatics. 2018. PMID: 29843589 Free PMC article.
-
Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size.Bioinformatics. 2013 May 15;29(10):1275-82. doi: 10.1093/bioinformatics/btt143. Epub 2013 Apr 14. Bioinformatics. 2013. PMID: 23589650 Free PMC article.
-
Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.Brief Bioinform. 2016 Mar;17(2):243-8. doi: 10.1093/bib/bbv035. Epub 2015 Jun 24. Brief Bioinform. 2016. PMID: 26108230 Review.
Cited by
-
Differentially expressed heterogeneous overdispersion genes testing for count data.PLoS One. 2024 Jul 17;19(7):e0300565. doi: 10.1371/journal.pone.0300565. eCollection 2024. PLoS One. 2024. PMID: 39018275 Free PMC article.
-
Chemotherapy for pain: reversing inflammatory and neuropathic pain with the anticancer agent mithramycin A.Pain. 2024 Jan 1;165(1):54-74. doi: 10.1097/j.pain.0000000000002972. Epub 2023 Jun 27. Pain. 2024. PMID: 37366593 Free PMC article.
-
Detection of genes with differential expression dispersion unravels the role of autophagy in cancer progression.PLoS Comput Biol. 2023 Mar 9;19(3):e1010342. doi: 10.1371/journal.pcbi.1010342. eCollection 2023 Mar. PLoS Comput Biol. 2023. PMID: 36893104 Free PMC article.
-
Cancer-related cognitive impairment is associated with perturbations in inflammatory pathways.Cytokine. 2021 Dec;148:155653. doi: 10.1016/j.cyto.2021.155653. Epub 2021 Aug 10. Cytokine. 2021. PMID: 34388477 Free PMC article.
-
Genomic profiling of Nipah virus using NGS driven RNA-Seq expression data.Bioinformation. 2019 Dec 31;15(12):853-862. doi: 10.6026/97320630015853. eCollection 2019. Bioinformation. 2019. PMID: 32256005 Free PMC article.
References
-
- Wang L, Li P, Brutnell TP (2010) Exploring plant transcriptomes using ultra high-throughput sequencing. Briefings in Functional Genomics 9: 118–128. - PubMed
-
- Cameron AC, Trivedi PK (1998) Regression Analysis of Count Data. Cambridge University Press.
-
- Si Y, Liu P (2012) An optimal test with maximum average power while controlling fdr with application to rna-seq data. Biometrics 69: 594–605. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources