Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Feb;10(2):133-9.
doi: 10.1038/nmeth.2288. Epub 2012 Dec 16.

Analysis of alternative cleavage and polyadenylation by 3' region extraction and deep sequencing

Affiliations

Analysis of alternative cleavage and polyadenylation by 3' region extraction and deep sequencing

Mainul Hoque et al. Nat Methods. 2013 Feb.

Abstract

Alternative cleavage and polyadenylation (APA) generates diverse mRNA isoforms. We developed 3' region extraction and deep sequencing (3'READS) to address mispriming issues that commonly plague poly(A) site (pA) identification, and we used the method to comprehensively map pAs in the mouse genome. Thorough annotation of gene 3' ends revealed over 5,000 previously overlooked pAs (∼8% of total) flanked by A-rich sequences, underscoring the necessity of using an accurate tool for pA mapping. About 79% of mRNA genes and 66% of long noncoding RNA genes undergo APA, but these two gene types have distinct usage patterns for pAs in introns and upstream exons. Quantitative analysis of APA isoforms by 3'READS indicated that promoter-distal pAs, regardless of intron or exon locations, become more abundant during embryonic development and cell differentiation and that upregulated isoforms have stronger pAs, suggesting global modulation of the 3' end-processing activity in development and differentiation.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

US Patent application PCT/US2012/052122 based on this work is pending.

Figures

Figure 1
Figure 1. Mapping pAs by 3′READS
(a) Schematic of the 3′READS method. See Methods for detail. (b) Optimization of washing condition to enrich RNAs with long poly(A) tails. Radioactively labeled A15 and A60 RNAs were synthesized by in vitro transcription using SP6 RNA polymerase. The X-ray film image shows the eluted RNA after RNase H digestion. The A60/A15 ratio indicates the difference in amount between eluted A60 and A15 RNAs. (c) Reads generated by 3′READS using the CU5T45 oligo or oligo(dT)10–25 (See Methods for detail). Top, schematic showing alignment of a read to genomic DNA. The last aligned position (LAP) and the putative pA are indicated by arrows. Bottom, distribution of three types of reads: 1) reads with ≥ 2 As immediately downstream of the LAP, which were used for pA identification and were called polyA site supporting (PASS) reads; 2) reads with <2 As immediately downstream of the LAP, and the LAP is near a pA (≤ 24 nt); 3) same as 2) except that the LAP is not near a pA (> 24 nt). (d) Nucleotide profiles around the LAP (set to position 0), as illustrated in (c). Top panels are reads generated by CU5T45 and bottom ones by oligo(dT)10–25. Left, PASS reads; middle and right, reads with <2 As immediately downstream of the LAP and the LAP is not near a pA, i.e., type 3 in (c). Reads whose LAP is flanked by A-rich sequences (middle) or non-A-rich sequences (right) areshown. The percent of total reads is shown in each graph. An A-rich sequence is defined as ≥6 consecutive As or ≥7 As in a 10 nt window in the −10 to +10 nt region around the LAP. (e) Percent of PASS reads assigned to rRNA, snoRNA, and snRNA genes for data generated by CU5T45 or oligo(dT)10–25. The ratio of the values is indicated.
Figure 2
Figure 2. Mouse pAs identified by 3′READS
(a) Distribution of PASS reads in the mouse genome (data from all the samples are included). (b) An example gene (Pde3a) showing PASS reads from 3′READS and RNA-seq reads (ENCODE project) used to assign pAs to the gene. (c) Histogram of the length of 3′ end extension for RefSeq mRNA genes (9,612 genes with extension > 0 nt). The median is indicated. (d) Venn diagram comparing pAs in the PolyA_DB 2 database with those identified in this study. (e) Percent of mRNA or lncRNA genes considered to have APA at different isoform relative abundance cutoffs. Numbers of mRNA and lncRNA genes analyzed are indicated.
Figure 3
Figure 3. Comparison of pAs flanked by A-rich or non-A-rich sequences
(a) Nucleotide profile around the pAs identified in this study. Left, pAs not flanked by A-rich sequences; right, pAs flanked by A-rich sequences. (b) Relative abundance of isoforms using pAs flanked by A-rich sequences or other pAs (non-A-rich). The cumulative fraction curves based on all genes analyzed in this study and on all samples combined. (c) PAS distribution in the −40 to −1 nt region for A-rich and non-A-rich pAs. The frequencies of occurrence of AAUAAA in −40 to −11 nt and −10 to −1 nt regions for A-rich and non-A-rich pAs are shown in a table. (d) Enrichment of CstF64 CLIP-seq reads around A-rich or non-A-rich pAs relative to randomly selected gene regions. Error bars represent the 90% confidence interval derived from bootstrapping (1,000 ×) of data.
Figure 4
Figure 4. APA of mouse mRNA and lncRNA genes
(a) Schematic of pA types. The full and short names for different pA types are indicated. The type number in parenthesis corresponds to the isoform number shown in the graph. Dotted lines indicate splicing. (b) Distribution of alternative pAs in different regions of mRNA or lncRNA genes. The p-value for the difference in distribution between mRNA and lnRNA genes is 0 (Chi-squared test). (c) Relative abundance of APA isoforms using different types of pA. The cumulative fraction curve is based on all genes analyzed in this study and on all samples combined. (d) Frequency of various PAS types for different types of pAs in mRNA and lncRNA genes. (e) Enrichment of CstF64 CLIP-seq reads around different types of pAs relative to randomly selected gene regions. Error bars represent the 90% confidence interval derived from bootstrapping (1,000 x) of data. (f) mRNA regions affected by alternative pAs. pAs were grouped based on gene type (multi-exon or single exon) and pA location. mRNA regions were separated into 5′UTR, CDS and 3′UTR. For intronic pAs, the mRNA region affected was defined by the exon immediately upstream of the pA. (g) Distribution of 3′UTR length for genes without alternative pAs in the 3′UTR (single), and genes with APA in the 3′UTR. For the latter, the shortest and longest isoforms were used for analysis. (h) APA regulates conserved elements in lncRNAs. Conserved elements are based on 30 mammalian species (see Methods for detail). The numbers of the conserved elements upstream or downstream of the first pA were calculated. In total, 599 and 391 lncRNA genes have the first pA in the upstream region and 3′-most exon, respectively. Only isoforms with relative expression level > 20% were analyzed.
Figure 5
Figure 5. General transcript lengthening in cell differentiation and embryonic development
(a) Schematic showing differentiation of 3T3-L1 and C2C12 cells. 3T3-L1 cells were stained with the Oil Red O (ORO). (b) Regulation of alternative pAs in the 3′-most exon. The number of genes with significantly upregulated distal pA isoforms (red dots) and the number of genes with significantly upregulated proximal pA isoforms (cyan dots) are indicated in each graph. The ratio of the numbers (upregulated vs. downregulated) is also indicated to show the general trend of regulation. Significantly regulated isoforms are those with p-value <0.05 (Fisher’s Exact test) and abundance change >5%. Only the two most abundant isoforms for each gene were analyzed. (c) Regulation of alternative pAs in upstream regions. As in (b), except that upstream region pA isoforms were compared with 3′-most exon isoforms. All upstream region pA isoforms were grouped together and so were the 3′-most exon isoforms. (d) Isoforms using strong pAs tend to be relatively upregulated in differentiation and development. Isoform relative abundance in whole body mix (top panels) and in in cell line mix (bottom panels, cell line mix 1 in Supplementary Table 2) for those upregulated (UP) and downregulated (DN) in differentiation and development. Regulated isoforms are those with p-value < 0.05 (Fisher’s Exact test) and abundance change > 5%, compared to all other isoforms of the same gene. Differentiation of 3T3-L1 and C2C12 cells and embryonic development are shown. Upstream region pAs are also shown as I/E (intron/upstream exon) pAs. (e) Top 5-mers consistently enriched for regions around the pAs of upregulated isoforms in differentiation of C2C12 and 3T3-L1 cells. P-value was based on the Fisher’s Exact test comparing pAs of upregulated isoforms with those of downregulated ones. Three regions surrounding the pA were analyzed, i.e., −100 to −41 nt, −40 to −1 nt, and +1 to +100 nt.

Similar articles

Cited by

References

    1. Colgan DF, Manley JL. Mechanism and regulation of mRNA polyadenylation. Genes Dev. 1997;11:2755–2766. - PubMed
    1. Proudfoot NJ. Ending the message: poly(A) signals then and now. Genes Dev. 2011;25:1770–1782. - PMC - PubMed
    1. Tian B, Graber JH. Signals for pre-mRNA cleavage and polyadenylation. Wiley Interdiscip Rev RNA. 2011 - PMC - PubMed
    1. Shi Y, et al. Molecular architecture of the human pre-mRNA 3′ processing complex. Mol Cell. 2009;33:365–376. - PMC - PubMed
    1. Tian B, Hu J, Zhang H, Lutz CS. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 2005;33:201–212. - PMC - PubMed

Publication types

Associated data