Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 8:17:199.
doi: 10.1186/s12864-016-2539-z.

A novel enrichment strategy reveals unprecedented number of novel transcription start sites at single base resolution in a model prokaryote and the gut microbiome

Affiliations

A novel enrichment strategy reveals unprecedented number of novel transcription start sites at single base resolution in a model prokaryote and the gut microbiome

Laurence Ettwiller et al. BMC Genomics. .

Abstract

Background: The initiating nucleotide found at the 5' end of primary transcripts has a distinctive triphosphorylated end that distinguishes these transcripts from all other RNA species. Recognizing this distinction is key to deconvoluting the primary transcriptome from the plethora of processed transcripts that confound analysis of the transcriptome. The currently available methods do not use targeted enrichment for the 5'end of primary transcripts, but rather attempt to deplete non-targeted RNA.

Results: We developed a method, Cappable-seq, for directly enriching for the 5' end of primary transcripts and enabling determination of transcription start sites at single base resolution. This is achieved by enzymatically modifying the 5' triphosphorylated end of RNA with a selectable tag. We first applied Cappable-seq to E. coli, achieving up to 50 fold enrichment of primary transcripts and identifying an unprecedented 16539 transcription start sites (TSS) genome-wide at single base resolution. We also applied Cappable-seq to a mouse cecum sample and identified TSS in a microbiome.

Conclusions: Cappable-seq allows for the first time the capture of the 5' end of primary transcripts. This enables a unique robust TSS determination in bacteria and microbiomes. In addition to and beyond TSS determination, Cappable-seq depletes ribosomal RNA and reduces the complexity of the transcriptome to a single quantifiable tag per transcript enabling digital profiling of gene expression in any microbiome.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Cappable-seq pipeline for TSS identification. a Schema of Cappable-seq protocol and the associated control library. b Replicate analysis. The correlation coefficient between replicate 1 and replicate 2 RRS is 0.983. c Enrichment score as a function of the mean of relative read score for the 36078 putative TSSs found in E. coli grown on minimal media. In blue are TSS that are enriched in Cappable-seq library. Grey are positions that are depleted in Cappable-seq. The removal of depleted positions eliminates 1354 spurious TSS primarily located in ribosomal loci
Fig. 2
Fig. 2
Promoter regions. Characteristics of the promoter region found using Cappable-seq. a The average phastcon score is plotted for each position from −100 bases upstream to +30 bases downstream of the Cappable-seq TSS (position 0) and the Cappable-seq specific TSS. b Sequence logo upstream of all Cappable-seq TSS and Cappable-seq specific TSS. c Over-represented motifs found in the promoter regions of Cappable-seq and Cappable-seq specific datasets. d Fraction of promoters having the sigma 70–10 motifs in the composite dataset of known TSS, Cappable-seq TSS, TSS common to Cappable-seq and the composite dataset of known TSS, and Cappable-seq specific TSS
Fig. 3
Fig. 3
Nucleotide preference at TSS. a Sequence logo of the nucleotide bias from −2 to +2 position of TSS. b Distribution of the strength of the TSS (in RRS in Cappable seq) as classified according to their −1 + 1 configuration with R being purine (A or G) and Y being pyrimidine (C or T). c Relative abundance of reads for each of the 16 possible TSS − 1 + 1 dinucleotides. Blue boxes are YR motifs, green boxes are YY or RR motifs and pink boxes are RY motifs. Percentages corresponds to the percentage of TSS having the aforementioned − 1 + 1 configuration (d). Over-represented motifs at − 35 and − 10 bp upstream of TSS with the-1C + 1C dinucleotide configuration
Fig. 4
Fig. 4
Intragenic TSS. a Distribution of the number of sense and antisense intragenic TSS as a function of the position within genes. b Box plot representing the distribution of the TSS strength (RRS score) for intergenic (red), sense intragenic (blue) and antisense intragenic (grey) TSS. c Distribution of intragenic sense (blue) and antisense (grey) TSS strength as a function of their position within genes
Fig. 5
Fig. 5
Positional preference of TSS relative to codon. Frequency of intragenic TSS relative to the first, second and third position of the codon for (a) the sense TSS and (b) the antisense TSS. Graphics on the left represent the overall frequency of TSS at each codon position across the entire gene length while the graphic on the right represent the frequency of TSS at each codon position as a function of the relative position within the coding gene (in 10 % increments of the total gene length)
Fig. 6
Fig. 6
TSS of mouse gut microbiome. Analysis of TSS for four representative species across four phyla of bacteria. a IGV display of read distribution in Akkermansia muciniphila in both biological replicates. b Promoter structures in all four species generated with Weblogo (for Biological replicate 1). The X axis represent the distance away from the TSS found by Cappable-seq. Y axis represent the amount of information present at every position in the sequence, measured in bits. c Percentage of leaderless TSS in replicate 1. d Read genomic distribution for replicate 1. e The correlation coefficient of relative read score (RRS) of TSS in the four representative species between the two biological replicate (two mouse gut microbiome) is 0.81

Similar articles

Cited by

References

    1. Giannoukos G, Ciulla DM, Huang K, Haas BJ. Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes. Genome Biol. 2012;13:r23. doi: 10.1186/gb-2012-13-3-r23. - DOI - PMC - PubMed
    1. Hsu CL, Stevens A. Yeast cells lacking 5′-- > 3′ exoribonuclease 1 contain mRNA species that are poly (A) deficient and partially lack the 5′ cap structure. Mol Cell Biol. 1993;13:4826–4835. doi: 10.1128/MCB.13.8.4826. - DOI - PMC - PubMed
    1. Kim D, Hong JS-J, Qiu Y, Nagarajan H, Seo J-H, Cho B-K, Tsai S-F, Palsson BØ. Comparative Analysis of Regulatory Elements between Escherichia coli and Klebsiella pneumoniae by Genome-Wide Transcription Start Site Profiling. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1002867. - DOI - PMC - PubMed
    1. Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiß S, Sittka A, Chabas S, Reiche K, Hackermüller J, Reinhardt R, Stadler PF, Vogel J. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature. 2010;464:250–255. doi: 10.1038/nature08756. - DOI - PubMed
    1. Sharma CM, Vogel J. Differential RNA-seq: the approach behind and the biological insight gained. Curr Opin Microbiol. 2014;19:97–105. doi: 10.1016/j.mib.2014.06.010. - DOI - PubMed