Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies
- PMID: 31992223
- PMCID: PMC6986029
- DOI: 10.1186/s12864-020-6502-7
Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies
Abstract
Background: High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths.
Results: Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL F-test performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size.
Conclusion: We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.
Keywords: Differentially expressed genes; Normalization; RNA-seq; Sample sizes; Statistical test.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
Similar articles
-
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017. PLoS One. 2017. PMID: 28459823 Free PMC article.
-
Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods.BMC Bioinformatics. 2018 Jul 18;19(1):274. doi: 10.1186/s12859-018-2261-8. BMC Bioinformatics. 2018. PMID: 30021534 Free PMC article.
-
Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.BMC Bioinformatics. 2015 Oct 28;16:347. doi: 10.1186/s12859-015-0778-7. BMC Bioinformatics. 2015. PMID: 26511205 Free PMC article.
-
Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.Brief Bioinform. 2016 Mar;17(2):243-8. doi: 10.1093/bib/bbv035. Epub 2015 Jun 24. Brief Bioinform. 2016. PMID: 26108230 Review.
-
Interpretation of differential gene expression results of RNA-seq data: review and integration.Brief Bioinform. 2019 Nov 27;20(6):2044-2054. doi: 10.1093/bib/bby067. Brief Bioinform. 2019. PMID: 30099484 Free PMC article. Review.
Cited by
-
Integrated analysis reveals the regulatory mechanism of the neddylation inhibitor MLN4924 on the metabolic dysregulation in rabbit granulosa cells.BMC Genomics. 2024 Mar 6;25(1):254. doi: 10.1186/s12864-024-10118-3. BMC Genomics. 2024. PMID: 38448814 Free PMC article.
-
Genome scale CRISPR screens identify actin capping proteins as key modulators of therapeutic responses to radiation and immunotherapy.bioRxiv [Preprint]. 2024 Jan 15:2024.01.14.575614. doi: 10.1101/2024.01.14.575614. bioRxiv. 2024. PMID: 38293095 Free PMC article. Preprint.
-
WASF3 overexpression affects the expression of circular RNA hsa-circ-0100153, which promotes breast cancer progression by sponging hsa-miR-31, hsa-miR-767-3p, and hsa-miR-935.Heliyon. 2023 Nov 29;9(12):e22874. doi: 10.1016/j.heliyon.2023.e22874. eCollection 2023 Dec. Heliyon. 2023. PMID: 38125536 Free PMC article.
-
Validation of potential RNA biomarkers for prostate cancer diagnosis and monitoring in plasma and urinary extracellular vesicles.Front Mol Biosci. 2023 Nov 30;10:1279854. doi: 10.3389/fmolb.2023.1279854. eCollection 2023. Front Mol Biosci. 2023. PMID: 38099195 Free PMC article.
-
Unveiling metabolic integration in psyllids and their nutritional endosymbionts through comparative transcriptomics analysis.iScience. 2023 Sep 15;26(10):107930. doi: 10.1016/j.isci.2023.107930. eCollection 2023 Oct 20. iScience. 2023. PMID: 37810228 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources