Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2015 Sep 3;16(1):675.
doi: 10.1186/s12864-015-1876-7.

Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap

Affiliations
Comparative Study

Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap

Shanrong Zhao et al. BMC Genomics. .

Abstract

Background: While RNA-sequencing (RNA-seq) is becoming a powerful technology in transcriptome profiling, one significant shortcoming of the first-generation RNA-seq protocol is that it does not retain the strand specificity of origin for each transcript. Without strand information it is difficult and sometimes impossible to accurately quantify gene expression levels for genes with overlapping genomic loci that are transcribed from opposite strands. It has recently become possible to retain the strand information by modifying the RNA-seq protocol, known as strand-specific or stranded RNA-seq. Here, we evaluated the advantages of stranded RNA-seq in transcriptome profiling of whole blood RNA samples compared with non-stranded RNA-seq, and investigated the influence of gene overlaps on gene expression profiling results based on practical RNA-seq datasets and also from a theoretical perspective.

Results: Our results demonstrated a substantial impact of stranded RNA-seq on transcriptome profiling and gene expression measurements. As many as 1751 genes in Gencode Release 19 were identified to be differentially expressed when comparing stranded and non-stranded RNA-seq whole blood samples. Antisense and pseudogenes were significantly enriched in differential expression analyses. Because stranded RNA-seq retains strand information of a read, we can resolve read ambiguity in overlapping genes transcribed from opposite strands, which provides a more accurate quantification of gene expression levels compared with traditional non-stranded RNA-seq. In the human genome, it is not uncommon to find genomic loci where both strands encode distinct genes. Among the over 57,800 annotated genes in Gencode release 19, there are an estimated 19 % (about 11,000) of overlapping genes transcribed from the opposite strands. Based on our whole blood mRNA-seq datasets, the fraction of overlapping nucleotide bases on the same and opposite strands were estimated at 2.94 % and 3.1 %, respectively. The corresponding theoretical estimations are 3 % and 3.6 %, well in agreement with our own findings.

Conclusions: Stranded RNA-seq provides a more accurate estimate of transcript expression compared with non-stranded RNA-seq, and is therefore the recommended RNA-seq approach for future mRNA-seq studies.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Non-stranded versus stranded RNA-seq protocol. The stranded protocol differs from the non-stranded protocol in two ways. First, during cDNA synthesis, the second-strand synthesis continues as normal except the nucleotide mix includes dUTPs instead of dTTPs. Second, after library preparation, a second-strand digestion step is added. This step ensures that only the first strand survives the subsequent PCR amplification step and hence the strand information of the libraries
Fig. 2
Fig. 2
Workflow for RNA-seq data analysis
Fig. 3
Fig. 3
Metrics for RNA-seq. a) The sequencing library size; b) the mapping summaries for sequence reads; c) the counting summaries for uniquely mapped reads; d) the ambiguous reads arising from gene overlapping; on average, the percentage of ambiguous reads drops approximately 3.1 % from non-stranded to stranded RNA-seq, and this drop roughly represents the overlapping arising from opposite strands; e) the correlation for gene expression profile among those eight samples; the samples are clearly clustered by sequencing protocol; f) the boxplot of gene expression
Fig. 4
Fig. 4
Estimated gene overlaps in Gencode Release 19. a) The same strand and opposite strand overlaps at the gene level; about 19 % of genes overlap with one or more genes at the opposite strand; b) the overlaps at the nucleotide base level. On average, the estimated overlapping at the same and opposite strands are 3 % and 3.6 %, respectively, and agree well with the practical RNA-seq dataset shown in Fig. 3d
Fig. 5
Fig. 5
Histograms and cumulative distributions for all pairs of overlapping genes. The ratio (i.e., the overlapping percentage) for each pair of genes is calculated by dividing the length of overlapping exons by the exon length of the shorter gene of the pair
Fig. 6
Fig. 6
Scatter plots of gene expression profiles between stranded and non-stranded RNA-seq. For samples PFE1, PFE2, PFE3, and PFE4, the scattering patterns are consistent. While the majority of genes are arrayed along the diagonal lines, there are still many genes whose expression levels were dramatically impacted by sequencing protocols. The x- and y-axis represent Log2(RPKM)
Fig. 7
Fig. 7
Differential analysis results for the comparison between stranded and non-stranded RNA-seq. Every point in the plot corresponds to a gene. The x-axis represents the log2 fold change of stranded over non-stranded, while the y-axis (-log10 (AdjustedPValue)) corresponds to the significance of a statistical test. All significant genes are colored in red. The criteria for significance are as follows: (1) an adjusted p value <0.05 (the horizontal dotted line); and (2) a fold change greater than 1.5 (the two vertical dotted lines)
Fig. 8
Fig. 8
The association between differential expression and gene overlap is gene-type dependent. a The percentage of genes that are differentially expressed in each gene category. Antisense and pseudogene are enriched. The y-axis represents percentage. b The dependency between differential expression and gene overlap from opposite strands. For protein coding, antisense and lincRNA gene types, the overlap is significantly higher in DE genes than in Non_DE genes
Fig. 9
Fig. 9
The gene expression of IL24, ICAM4, and GAPDH in stranded and non-stranded RNA-seq
Fig. 10
Fig. 10
The mapping profiles for IL24 in Replicate PFE1. In non-stranded RNA-seq, all reads mapped to IL24 are counted regardless if they are in the forward or reverse strands. However, in stranded RNA-seq, nearly all reads are mapped to the “+” strand and thus not counted because these reads are not reverse complementary to IL24 in the “+” strand. However, the coverage pattern of sequence reads does not support the sequence reads mapped to the IL24 genomic region that truly originate from this gene. All genes, transcripts, and sequence reads are colored in blue if they are in the “+” strand and colored in green if in the “−“ strand
Fig. 11
Fig. 11
The mapping profiles for ICAM4 (intercellular adhesion molecule 4) in Replicate PFE1. The gene ICAM4 is on the “+” strand, and 100 % contained within CTD-2369P2.8 in the “−“ strand. In non-stranded RNA-seq, the ambiguous reads in overlapping regions are excluded from counting, which explains why there is no expression for ICAM4. However, the ambiguous reads can be perfectly resolved in stranded RNA-seq. By considering the read direction, all reads can be counted to ICAM4 because they are reverse complementary to ICAM4, but not CTD-2369P2.8

Similar articles

Cited by

References

    1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Wang Z, Gerstein M, Snyder M. RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009;10(1):57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Mutz KO, Heilkenbrinker A, Lönne M, Walter JG, Stahl F. Transcriptome analysis using next-generation sequencing. Curr Opin Biotechnol. 2013;24(1):22–30. doi: 10.1016/j.copbio.2012.09.004. - DOI - PubMed
    1. Malone J, Oliver B. Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. 2011;9:34. doi: 10.1186/1741-7007-9-34. - DOI - PMC - PubMed
    1. Zhao S, Fung-Leung W-P, Bittner A, Ngo K, Liu X. Comparison of RNA-seq and microarray in transcriptome profiling of activated T cells. PloS ONE. 2014;9(1):e78644. doi: 10.1371/journal.pone.0078644. - DOI - PMC - PubMed

Publication types