Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Nov 1;40(20):10084-97.
doi: 10.1093/nar/gks804. Epub 2012 Sep 10.

A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae

Affiliations
Comparative Study

A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae

Intawat Nookaew et al. Nucleic Acids Res. .

Abstract

RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the Illumina platform, and to perform a cross-platform comparison based on the results obtained through Affymetrix microarray. As a case study for our work we, used the Saccharomyces cerevisiae strain CEN.PK 113-7D, grown under two different conditions (batch and chemostat). Here, we asses the influence of genetic variation on the estimation of gene expression level using three different aligners for read-mapping (Gsnap, Stampy and TopHat) on S288c genome, the capabilities of five different statistical methods to detect differential gene expression (baySeq, Cuffdiff, DESeq, edgeR and NOISeq) and we explored the consistency between RNA-seq analysis using reference genome and de novo assembly approach. High reproducibility among biological replicates (correlation≥0.99) and high consistency between the two platforms for analysis of gene expression levels (correlation≥0.91) are reported. The results from differential gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays) for gene expression analysis and addresses the contribution of the different steps involved in the analysis of RNA-seq data.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Study design overview. The same initial culture of S. cerevisiae strain CEN.PK-113-7D was used for DNA-seq (gray line) and transcriptome analysis to reduce technical variation and polymorphism. The strain was cultivated under two different metabolic conditions, in well controlled batch (red line) and chemostat (blue line) fermentation. From the triplicates’ cultures, samples for extraction of DNA and RNA were extracted. The extracted RNA was used, in parallel, for microarray analysis through Affymetrix platform (dash lines) and for RNA-seq (solid line). DNA-seq and RNA-seq were performed with the Illumina platform. DNA-seq data were used to identify the genetic variation (SNVs and indels) between the strain CEN.PK 113-7D and the reference strain S288 and to identify genetic variations in the microarray probes. The RNA-seq data were analyzed with the reference mapping approach and de novo assembly approach. The results obtained with different methods were compared and cross-compared with the results from microarray analysis.
Figure 2.
Figure 2.
Sample-wise and gene-wise correlation of transcriptome data from microarray and RNA-seq with different processing methods. (A) Upper-right triangle matrix: pairwise correlation of different biological replicates from batch and chemostat cultivations (for microarray analysis the normalized signals and for RNAseq analysis the FPKM valued were used). The color intensities (scale in the side bar) and the numbers indicate the degree of pairwise correlation. (B) Lower-left triangle matrix: scatter plot based on fold changes of gene expression (average values, batch vs chemostat). The red numbers indicate the level of pairwise correlation between different methods. On the diagonal of the triangle matrix, the distribution of fold changes of each processing methods is presented as histrograms. Array = microarray, Gsnap = process quality reads by Gsnap aligner after removal of potential PCR duplicate, n.Gsnap = process quality reads by Gsnap aligner without removing potential PCR duplicate, Stampy = process quality reads by Stampy aligner after removal of potential PCR duplicate, n.Stampy = process quality reads by Stampy aligner without removing potential PCR duplicate, TopHat = process quality reads by TopHat aligner after removal of potential PCR duplicate, n.TopHat = process quality reads by TopHat aligner without removing potential PCR duplicate. (C) Yellow open circle, red open triangle, cyan plus sign and blue cross sign represent the average gene expression values from microarray of batch and chemostat cultivation and from RNA-seq of batch and chemostat cultivation, respectively. On the left, the distribution of average expression values from microarray and RNA-seq analysis is presented as orange boxplot and dark cyan boxplot (combined batch and chemostat cultivation conditions), respectively. At the bottom, the distribution of the gene-wise correlation values is presented as a white boxplot and density plot.
Figure 3.
Figure 3.
Comparisons of number of DGE identified by different statistical methods of RNA-seq data and cross comparison with DGE identified from microarray data. (A) Venn’s diagram of the comparison of differential gene expression based on RNA-seq data (result from Stampy aligner) through five different statistical methods: Cuffdiff, DESeq, NOISeq, edgeR and baySeq. (B) Venn’s diagram of the cross comparison of differential gene expression based on RNA-seq data (result from Stampy aligner) identified through Cuffdiff, NOISeq and DESeq method versus differential gene expression from microarray data (see the other comparison in different method combination in Supplementary Figure S2.) (C) Venn’s diagram of the cross comparison of DGE based on RNA-seq data identified through Cuffdiff method, using the three different aligners. The similar comparison using baySeq, DESeq, edgeR and NOISeq are provided in Supplementary Figure S3. The potential factors underlying the differences in genes identified with each method are presented as percentages pie chart. All Venn’s diagrams were built based on Q-value <1e−5 for all methods except NOISeq P > 0.875 was used as the cut-off.
Figure 4.
Figure 4.
Coverage plots of mapped reads shows different capabilities of the three different aligners. (A) The ORF YHR215W (PHO11) contains many SNVs on the coding region (green box). (B) The ORF YHL008C contains many INDELs on the coding region (green box). (C) The ORF YLR344W (RPL26A) contains a small exon (green box).
Figure 5.
Figure 5.
Comparisons of transcriptome analysis through de novo assembly and reference genome mapping approach and cross-comparison with microarray data. (A) Upper-right triangle matrix: pairwise correlation of different biological replicates from batch and chemostat cultivations (for microarray analysis the normalized signals and for RNA-seq analysis, the FPKM values were used). The color intensities (scale in the side bar) and the numbers indicate the degree of pairwise correlation. (B) Lower-left triangle matrix: scatter plot based on fold changes of gene expression (average values, batch versus chemostat). The red numbers indicate the level of pairwise correlation between different methods. On the diagonal of the triangle matrix, the distribution of fold changes of each processing method is presented as histograms. Array = microarray, De novo = De novo assembly approach, Ref. mapped = Reference genome reads mapping approach. The RNAseq by both the approaches were processed quality reads by TopHat aligner with removing potential PCR duplicate. (C) Comparisions of number of transcripts detected by different approach (D) Comparison of number of DGEs identified by different transcriptome analysis of RNA-seq data and cross-comparison with differential gene expression identified from microarray data.
Figure 6.
Figure 6.
Clustered heatmap of GO enrichment analysis. The color intensities indicate the level of enrichment score of each GO term.

Similar articles

Cited by

References

    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009;10:57–63. - PMC - PubMed
    1. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 2011;12:87–98. - PMC - PubMed
    1. Wilhelm BT, Landry JR. RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods. 2009;48:249–257. - PubMed
    1. Guida A, Lindstadt C, Maguire SL, Ding C, Higgins DG, Corton NJ, Berriman M, Butler G. Using RNA-seq to determine the transcriptional landscape and the hypoxic response of the pathogenic yeast Candida parapsilosis. BMC Genomics. 2011;12:628. - PMC - PubMed
    1. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517. - PMC - PubMed

Publication types