Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015:2015:621690.
doi: 10.1155/2015/621690. Epub 2015 Jun 15.

The Impact of Normalization Methods on RNA-Seq Data Analysis

Affiliations

The Impact of Normalization Methods on RNA-Seq Data Analysis

J Zyprych-Walczak et al. Biomed Res Int. 2015.

Abstract

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Bar plots of the DEGs with specified levels of count abundance in all studied data sets. On the x-axis the methods of normalization are featured, whereas the y-axis represents the number of DEGs determined after each normalization procedure. The bar colours represent the groups of genes of particular level of expression.
Figure 2
Figure 2
Diagnostic plots for the AML data set. Besides the five normalization methods, raw data (RD) were also included in the plots as a benchmark. (a) presents calculated normalization factor values across the samples by each method. The samples are ordered by the minimum values of normalization factors. (b) shows 95% confidence intervals for the mean of the percentages of classification errors calculated for each method based on five selected classifiers. (c) shows the numbers of common DE genes across each pair of normalization methods. The size and shading of the circles represent the average percentage value of common genes between each pair of methods. (d) presents the results of clustering of the normalization methods based on 20 common DE genes found by each of these methods. A dendrogram was created with hierarchical clustering based on Ward's method.

Similar articles

Cited by

References

    1. Zhao S., Fung-Leung W.-P., Bittner A., Ngo K., Liu X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS ONE. 2014;9(1) doi: 10.1371/journal.pone.0078644.e78644 - DOI - PMC - PubMed
    1. Wang E. T., Sandberg R., Luo S., et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456(7221):470–476. doi: 10.1038/nature07509. - DOI - PMC - PubMed
    1. Pan Q., Shai O., Lee L. J., Frey B. J., Blencowe B. J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics. 2008;40(12):1413–1415. doi: 10.1038/ng.259. - DOI - PubMed
    1. Landau W. M., Liu P. Dispersion estimation and its effect on test performance in RNA-seq data analysis: a simulation-based comparison of methods. PLoS ONE. 2013;8(12) doi: 10.1371/journal.pone.0081415.e81415 - DOI - PMC - PubMed
    1. Mortazavi A., Williams B. A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed

Publication types

MeSH terms