Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 10;12(6):874.
doi: 10.15252/msb.20166869.

Strand-specific, high-resolution mapping of modified RNA polymerase II

Affiliations

Strand-specific, high-resolution mapping of modified RNA polymerase II

Laura Milligan et al. Mol Syst Biol. .

Abstract

Reversible modification of the RNAPII C-terminal domain links transcription with RNA processing and surveillance activities. To better understand this, we mapped the location of RNAPII carrying the five types of CTD phosphorylation on the RNA transcript, providing strand-specific, nucleotide-resolution information, and we used a machine learning-based approach to define RNAPII states. This revealed enrichment of Ser5P, and depletion of Tyr1P, Ser2P, Thr4P, and Ser7P in the transcription start site (TSS) proximal ~150 nt of most genes, with depletion of all modifications close to the poly(A) site. The TSS region also showed elevated RNAPII relative to regions further 3', with high recruitment of RNA surveillance and termination factors, and correlated with the previously mapped 3' ends of short, unstable ncRNA transcripts. A hidden Markov model identified distinct modification states associated with initiating, early elongating and later elongating RNAPII. The initiation state was enriched near the TSS of protein-coding genes and persisted throughout exon 1 of intron-containing genes. Notably, unstable ncRNAs apparently failed to transition into the elongation states seen on protein-coding genes.

Keywords: hidden Markov model; polymerase CTD phosphorylation; transcription; yeast.

PubMed Disclaimer

Figures

Figure EV1
Figure EV1. Rpo21 crosslinking and purification during mCRAC
The schematic indicates the major steps in the mCRAC protocol. The inset gel shows two replicates of the gel purification of Rpo21, with antibodies against the modifications indicated or a no antibody control (NAb), visualized by autoradiography of the 5′ [32P] labeled, crosslinked RNA.
Figure 1
Figure 1. RNAPII can be mapped with high‐resolution using CRAC
  1. Outline of the mCRAC protocol. See Fig EV1 for further details. In all figures, the analyses used S. cerevisiae strains derived from BY4741.

  2. Distribution of RNAPII reads across transcript classes determined by CRAC analyses of Rpo21‐HTP.

  3. Distribution of RNAPII across protein‐coding genes in the sense and antisense orientations. In the upper panel, the vertical line indicates the TSS. The curved line indicates the location of the poly(A). All protein‐coding genes are shown in the sense orientation, ordered with the shortest ORF at the top. The lower panel shows reads that are antisense to the same regions.

  4. Ratio of spliced to unspliced RNAs in RNAPII‐bound RNAs, calculated as the ratio of sequences spanning exon–exon (spliced) relative to intron–exon (unspliced) junctions.

  5. Peaks in RNAPII binding correlate with nucleosome positions. The zero point (solid vertical line) is the mapped positions of nucleosome 5′ boundaries (Jiang & Pugh, 2009) across all protein‐coding genes. The red line shows the overall RNAPII density with respect to each nucleosome boundary. Dashed lines show locations RNAPII maxima, which show an apparent 150 nt periodicity.

Figure EV2
Figure EV2. Distribution of RNAPII with respect to gene features
RNAPII signals drop around the position of the translation termination codon. The graphs show the site of the termination codon as 0 plus 500 nt flanking regions. The density of crosslinking over all protein‐coding genes is shown for Rpo21 (total RNAPII), as well as the poly(A)‐binding protein Pab1 and the cytoplasmic degradation factor Xrn1 (data from Tuck & Tollervey, 2013). The frequency of polyadenylation signals, AWTAAA and TATATA, are also indicated. Nucleosome distribution: The curve shows the fraction of transcripts where a nucleosome matches the corresponding position.
Figure 2
Figure 2. Binding by RNA surveillance and degradation factors is strongly enriched close to the TSS on protein‐coding genes
  1. A–J

    Each panel shows the hit density, normalized to the maximal binding value across all mRNA genes longer than 500 nt inside each individual experiment. The two lines in each panel represent results from independent CRAC experiments. The TSS‐proximal 150 nt region is shaded. (A) Rpo21 (RNAPII); (B) Rrp44; (C) Rrp6; (D) Trf4; (E) Air2; (F) Nab3; (G) Mex67; (H) Distribution of the 3′ ends of short, promoter‐proximal, sense‐orientated ncRNA transcripts (S CUTs); (I) Nab2; (J) Hrp1. Sequence data source: (A‐E) (this work), (F) (Holmes et al, 2015). (G, I, J) (Tuck & Tollervey, 2013), (H) (Neil et al, 2009).

Figure 3
Figure 3. Profiles of RNAPII phosphorylation on mRNA and ncRNA genes can be generated using mCRAC
  1. Distribution of RNAPII phosphorylation across protein‐coding genes aligned at the TSS, as in Fig 1C as determined by mCRAC analyses on Rpo21‐HTP. Red color indicates depletion, and green color indicates enrichment of phosphorylation relative to total RNAPII.

  2. As panel (A) but with genes aligned at the polyadenylation site.

  3. Metagene analysis of RNAPII phosphorylation enrichment relative to the 5,000 strongest RNAPII pause sites in mRNA genes, identified by NET‐Seq (Churchman & Weissman, 2011).

  4. Metagene analysis of RNAPII phosphorylation enrichment relative to transcription start sites, calculated for all mRNA genes. The TSS‐proximal 150 nt region, where Ser5P is enriched and Ser2P and Tyr1P are depleted, is indicated with a dashed line.

  5. Metagene analysis of RNAPII phosphorylation on CUTs as for (D).

  6. Metagene analysis of RNAPII phosphorylation on SUTs as for (D).

Figure EV3
Figure EV3. Distribution of RNAPII phosphorylation across mRNAs and non‐coding RNA genes
Distribution of RNAPII phosphorylation as in Fig 3A, across all protein‐coding genes, compared to the SUT, CUT, and snoRNA classes of ncRNA. Red color indicates depletion, and green color indicates enrichment of phosphorylation relative to total RNAPII. The graph below each panel shows a metagene analysis of RNAPII phosphorylation enrichment for all genes in each class.
Figure EV4
Figure EV4. Comparison of distributions of phosphorylation enrichment between different transcript classes
  1. Distributions of phosphorylation enrichment on individual mRNAs, CUTs, and SUTs. The boxes show the median and interquartile range of phosphorylation enrichment scores for mRNAs (N = 5,171), CUTs (N = 925), and SUTs (N = 847).

  2. Distributions of phosphorylation enrichment in the first 500 nt of mRNAs, CUTs, and SUTs. Only genes with length greater than 500 nt were analyzed. The boxes show the median and interquartile range of phosphorylation enrichment scores for mRNAs (N = 5,040), CUTs (N = 294), and SUTs (N = 563).

Data information: The whiskers indicate 1.5 times the interquartile range, and the asterisks indicate statistical significance (P < 0.01, Wilcoxon test with Bonferroni correction, N = 30).
Figure 4
Figure 4. HMM analyses of phosphorylation state distributions
  1. A

    Metagene analysis of frequency distribution for each state on all protein‐coding genes. See expanded view for analyses of replicate datasets and statistical analyses.

  2. B

    Genome browser views showing the distribution of the 8 states in the HMM over an unspliced gene (LCP5) and a spliced gene (RPS0B).

  3. C

    Learned emission matrix of the 8‐state HMM. Each row shows the average log‐enrichment levels of the different phosphorylated forms of Rpo21 over total RNAPII in one of the states.

  4. D

    Comparison of the locations of the 3′ end of initiation state I1 with the 3′ boundary of nucleosome 1.

  5. E

    Comparison of the locations of the 5′ end of early elongation state EE with the 5′ boundary of nucleosome 2.

  6. F–H

    The presence of an intron is associated with displacement of phosphorylation state boundaries. The graphs show state fold enrichment for each state over protein‐coding genes lacking an intron (F), containing short exon 1 (< 100 nt) regions (G) or long (> 100 nt) exon 1 regions (H). For each panel, the length of each gene has been divided into ten bins to allow the combination of genes with different lengths. T, Transcript; E1, Exon 1; I, Intron; E2, Exon 2. Intron‐containing genes in yeast are generally highly expressed, and the top quartile of intronless genes was therefore taken for comparison.

Figure EV5
Figure EV5. HMM transition matrix, reproducibility of results, and state enrichment analysis
  1. Plot showing the mean squared error with respect to the number of states in the HMM.

  2. Learned state transition matrix for 8‐state HMM. Each entry (i,j) of the matrix shows the probability of transitioning from hidden state i to j at any position along the genome.

  3. State distributions across first quartile of protein‐coding genes from two independent mCRAC analyses.

  4. State distributions across first quartile of protein coding relative to the positions of nucleosomes 1–5 (N1–N5) on these genes.

Figure EV6
Figure EV6. Comparison of the effects of different state numbers in the HMM
  1. A

    Emission matrices for 6‐, 8‐, and 10‐state HMM models.

  2. B–D

    (B) 6‐state HMM model; (C) 8‐state HMM model; (D) 10‐state HMM model. In each panel, the left graph (I) shows the top quartile of protein‐coding genes, the middle graph (II) shows spliced genes with short exon 1 regions (< 100 nt) and the right graph (III) shows spliced genes with long exon 1 regions (> 100 nt). In each model, preferential initiation and elongation states are clearly resolved. In all models, the initiation state is clearly extended further 3′ on intron‐containing genes.

Figure EV7
Figure EV7. The presence of an intron is associated with displacement of phosphorylation state boundaries
  1. A

    Average state probabilities shown individually for each state over protein‐coding genes lacking an intron (intronless) or with short (< 100 nt) or long (> 100 nt) exon 1 regions, aligned by the transcription start site (TSS).

  2. B, C

    Average state probabilities shown individually for each state over protein‐coding genes with short (< 100 nt) or long (> 100 nt) exon 1 regions, aligned by 5′ splice site (5′SS; panel B) or 3′ splice site (3′SS; panel C).

  3. D

    Distribution of phosphorylated RNAPII relative to the 3′ splice site (3′SS) in intron‐containing genes.

Figure 5
Figure 5. Metagene HMM analyses of phosphorylation states on protein‐coding and ncRNA genes
  1. Comparison of state frequencies over expression‐matched protein‐coding mRNAs, SUTs, and CUTs.

  2. Comparison of state distributions over expression‐matched protein‐coding mRNAs and CUTs as in A.

  3. The failure of lncRNAs to exit initiation state I1 is not a consequence of short length. The curves show the positions at which RNAPII exits initiation state I1 (red curve) and enters early elongation state EE (black curve), relative to the lengths of CUTs (green line) and SUTs (blue line).

Figure 6
Figure 6. Model comparing the state transitions on coding and non‐coding RNA transcripts
Yeast genes typically have a well‐ordered nucleosome close to the transcription start site. On both mRNA and ncRNA genes, RNAPII generally in an initiation state (I1 and I2; shown as I1) while traversing the first nucleosome. This state favors recruitment of nuclear RNA surveillance machinery, including the NNS termination complex and the TRAMP–exosome degradation system. On mRNAs, RNAPII then transitions into the early elongation state (EE) followed by the major elongation states (E1 to E3; shown as E1). On intron‐containing mRNAs, the transition from initiation of early elongation states is displaced further 3′. On ncRNAs, the initiation state persists and the failure to transition to an elongation states (EE and E1) favors termination and transcript degradation.

Similar articles

Cited by

References

    1. Alexander RD, Innocente SA, Barrass JD, Beggs JD (2010) Splicing‐dependent RNA polymerase pausing in yeast. Mol Cell 40: 582–593 - PMC - PubMed
    1. Arigo JT, Eyler DE, Carroll KL, Corden JL (2006) Termination of cryptic unstable transcripts is directed by yeast RNA‐binding proteins Nrd1 and Nab3. Mol Cell 23: 841–851 - PubMed
    1. de Boer CG, van Bakel H, Tsui K, Li J, Morris QD, Nislow C, Greenblatt JF, Hughes TR (2014) A unified model for yeast transcript definition. Genome Res 24: 154–166 - PMC - PubMed
    1. Buratowski S (2009) Progression through the RNA Polymerase II CTD Cycle. Mol Cell 36: 541–546 - PMC - PubMed
    1. Camblong J, Iglesias N, Fickentscher C, Dieppois G, Stutz FÁ (2007) Antisense RNA stabilization induces transcriptional gene Silencing via histone deacetylation in S. cerevisiae . Cell 131: 706–717 - PubMed

MeSH terms