Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 30:14:520.
doi: 10.1186/1471-2164-14-520.

Directional RNA-seq reveals highly complex condition-dependent transcriptomes in E. coli K12 through accurate full-length transcripts assembling

Affiliations

Directional RNA-seq reveals highly complex condition-dependent transcriptomes in E. coli K12 through accurate full-length transcripts assembling

Shan Li et al. BMC Genomics. .

Abstract

Background: Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads.

Results: To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases.

Conclusions: As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Strand specificity of the directional RNA-seq libraries. The percentage of total nucleotides mapped to sense strand, antisense strand and intergenic regions is shown for the seven samples.
Figure 2
Figure 2
Distribution of the genes with more than the indicated percentage of their length covered by at least one read in the samples. Less than 60% of genes have their length completely covered by at least one read. Over 80% genes have over 50% of their length covered by at least one read except for sample HS60 min.
Figure 3
Figure 3
Reads are biased to the 5’-end of operons. The sufficiently expressed known multiple-gene operons (Additional file 2) and singleton operons are equally divided into 20 bins, and the average expression values in each bin of all operons in each sample were displayed. The top 10% most highly expressed genes were excluded from the calculation.
Figure 4
Figure 4
Cumulative distributions of the length of interoperonic regions and the length of gaps in sufficiently expressed regions.
Figure 5
Figure 5
Evaluation of the algorithm based on operon pairs in the seven samples.The dashed horizontal line is at the 95.87% level, and the vertical bars indicate standard errors.
Figure 6
Figure 6
Evaluation of the algorithm based on entire operon structures in the seven samples. The dashed horizontal line is at the 95.3% level, and the vertical bars indicate standard errors.
Figure 7
Figure 7
Distribution of the Pribnow box start position relative to predicted TSS appearing in multiple samples (black dots) or in a single sample (red dots).
Figure 8
Figure 8
Position-dependent non-uniform coverage of the reads along the hem operon hemCDXY. The vertical axis is the number of reads covered at the positions. The orange and dark green bars at the button of the graph represent the reverse and forward strands, respectively. Segments with arrows represent genes. The graphs were generated using IGB. To make the expression levels for the four genes comparable in different samples, the same scale (1,200) of the vertical axis is used for all the samples. Although this four-gene operon was consecutively covered by the reads under different cultures and growth phases, there are highly similar patterns of position-dependent non-uniform coverage of the reads along the operon in the samples.
Figure 9
Figure 9
Reads coverage of the genes in the phn operon. The vertical axis is the number of reads covered at the positions. The orange and dark green bars represent the forward and reverse strands, respectively. Segments with arrows represent genes. Genes from the right to left are yjdN, phnC, phnD, phnE, phnF, phnG, phnH, phnI, phnJ, phnK, phnL, phnM, phnN, phnO and phnP. The graphs were generated using IGB. To make the expression levels for the 14 genes in different samples visible and comparable, the same vertical axis scale (50) is used for the LB and HS treatments, and the same vertical axis scale (450) is used for M-P treatments. Some positions with low read coverage cannot be shown while some other positions with high coverage are truncated. Note the varying levels of coverage and gaps along the operon under different cultures and growth phases, and again the similar position-dependent non-uniform coverage of the reads along the operon.
Figure 10
Figure 10
Distribution of the length of assembled asRNA and ncRNAs. For clarity, only the range of 1 ~ 400 nt is shown, but some asRNA can be longer than 1,000 nt.
Figure 11
Figure 11
QQ-plot comparing the distribution of centroid coverage values of the positive training set in all the samples but LB with the fitted Poisson distribution. Deviation of a data point from the line y = x indicates its deviation from the theoretical Poisson distribution. Parameters of the Poisson distribution are estimated using the maximum likelihood method.
Figure 12
Figure 12
Impact of highly expressed genes on the mapped nucleotides in coding regions. Genes were sorted in the descending order of their number of mapped nucleotides in reads. The top 10 percent of genes with the highest read counts contribute to around 80% ~90% mapped nucleotides in the coding regions.
Figure 13
Figure 13
Structure of the HMM for assembling operons/transcripts using RNA-seq reads. E represents the expression state and N the non-expression state, Letters r1, r2,…,rn are the emission values of E, μcontig is the mean length of sufficiently expressed contigs in the positive training set; and s1, s2,…, sN are the emission values of N, and μzero is the mean length of the non-expressed regions in the negative training set.
Figure 14
Figure 14
Selection of known adjacent operon pairs for training and evaluation. A: The intergenic region between two adjacent genes in an operon is doubled by extending its two ends in the two flanking genes. B: A sufficiently expressed gene is equally divided into n bins, and its central half is further equally divided into n bins. The NPKB values for each bin of a gene and of its central portion are a1,…,ai,…, aj, …, an and b1,…, bi,…, bj,…, bn, respectively. An extended intergenic region is similarly divided by treating it as a “gene” with the intergenic region being the central portion of the “gene”. C: Distribution of PCC values between the two vectors for sufficiently expressed genes with a bin size n = 4. We choose 0.3 as the cutoff of PCC value since 60.1% of sufficiently expressed genes can be included.
Figure 15
Figure 15
Distributions of the lengths of sufficiently expressed contigs and non-expressed regions in all the samples except LB. A: Histogram of the lengths of sufficiently expressed contigs (bin size =50 nt). The curve is the geometric distribution with the success probability p = 0.0006503 estimated by the maximum likelihood method. The inset is a blow-up view of the region of length 1 ~ 7,000 nt. B: Histogram of the lengths of non-expressed regions (bin size =50 nt). The curve is the geometric distribution with p = 0.00577 estimated by the maximum likelihood method. C: QQ-plot of the lengths of the sufficiently expressed contigs against the fitted geometric distribution. D: QQ-plot for the lengths of non-expressed regions against the fitted geometric distribution.

Similar articles

Cited by

References

    1. Liu JM, Camilli A. A broadening world of bacterial small RNAs. Curr Opin Microbiol. 2010;13:18–23. doi: 10.1016/j.mib.2009.11.004. - DOI - PMC - PubMed
    1. Repoila F, Darfeuille F. Small regulatory non-coding RNAs in bacteria: physiology and mechanistic aspects. Biol Cell. 2009;101:117–131. doi: 10.1042/BC20070137. - DOI - PubMed
    1. Thomason MK, Storz G. Bacterial antisense RNAs: how many are there, and what are they doing? Annu Rev Genet. 2010;44:167–188. doi: 10.1146/annurev-genet-102209-163523. - DOI - PMC - PubMed
    1. Georg J, Hess WR. cis-antisense RNA, another level of gene regulation in bacteria. Microbiol Mol Biol Rev. 2011;75:286–300. doi: 10.1128/MMBR.00032-10. - DOI - PMC - PubMed
    1. Keseler IM, Collado-Vides J, Santos-Zavaleta A, Peralta-Gil M, Gama-Castro S, Muniz-Rascado L, Bonavides-Martinez C, Paley S, Krummenacker M, Altman T. et al.EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Res. 2011;39:D583–590. doi: 10.1093/nar/gkq1143. - DOI - PMC - PubMed

Publication types