Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 20;20(Suppl 24):679.
doi: 10.1186/s12859-019-3247-x.

A protocol to evaluate RNA sequencing normalization methods

Affiliations

A protocol to evaluate RNA sequencing normalization methods

Zachary B Abrams et al. BMC Bioinformatics. .

Abstract

Background: RNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease. However, sequencing technologies often unintentionally introduce experimental error into RNA sequencing data. To counteract this, normalization methods are standardly applied with the intent of reducing the non-biologically derived variability inherent in transcriptomic measurements. However, the comparative efficacy of the various normalization techniques has not been tested in a standardized manner. Here we propose tests that evaluate numerous normalization techniques and applied them to a large-scale standard data set. These tests comprise a protocol that allows researchers to measure the amount of non-biological variability which is present in any data set after normalization has been performed, a crucial step to assessing the biological validity of data following normalization.

Results: In this study we present two tests to assess the validity of normalization methods applied to a large-scale data set collected for systematic evaluation purposes. We tested various RNASeq normalization procedures and concluded that transcripts per million (TPM) was the best performing normalization method based on its preservation of biological signal as compared to the other methods tested.

Conclusion: Normalization is of vital importance to accurately interpret the results of genomic and transcriptomic experiments. More work, however, needs to be performed to optimize normalization methods for RNASeq data. The present effort helps pave the way for more systematic evaluations of normalization methods across different platforms. With our proposed schema researchers can evaluate their own or future normalization methods to further improve the field of RNASeq normalization.

Keywords: Biological variability; Normalization; RNASeq; Standardization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Bar Plot of Normalization Methods and their relative errors from a two-way ANOVA. The MSE for each of the features (site and biological condition) can be used to measure the amount of variance attributed to that specific feature. The top narrow striped bar is site dependent variability (batch effects); the solid bar is biological variability; and the bottom, wide striped bar is the residual variability
Fig. 2
Fig. 2
Raw read counts for the gene TP53 from the Australian Genome Research Facility site arranged by sample types (a, c, d, and b). The Y axis shows the read counts. The blank space in the middle represents where a 50–50 mixture of (a and b) would be located if one had been created and measured. By leaving this blank space, a visual interpretation can be made for the linearity between (a and b) by whether (c and d) mixture models fall on this linear line. If C or D do not fall on the linear relationship of A and B then the normalization method is imposing unwanted structure on the data. If all four samples (a, b, c and d) form a clear linear relationship then that normalization method is representing the true biological structure of the data
Fig. 3
Fig. 3
a TP53. b POLR2A. c CD59. d GAPDH

Similar articles

Cited by

References

    1. Li S, et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol. 2014;32(9):888–895. doi: 10.1038/nbt.3000. - DOI - PMC - PubMed
    1. Li P, et al. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data. BMC Bioinform. 2015;16:347. doi: 10.1186/s12859-015-0778-7. - DOI - PMC - PubMed
    1. Zyprych-Walczak J, et al. The impact of normalization methods on RNA-Seq data analysis. Biomed Res Int. 2015;2015:621690. doi: 10.1155/2015/621690. - DOI - PMC - PubMed
    1. Dillies MA, et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform. 2013;14(6):671–683. doi: 10.1093/bib/bbs046. - DOI - PubMed
    1. Shi L, et al. The MicroArray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24(9):1151–1161. doi: 10.1038/nbt1239. - DOI - PMC - PubMed

LinkOut - more resources