Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan;43(2):817-35.
doi: 10.1093/nar/gku1361. Epub 2014 Dec 29.

Identification of RNA polymerase III-transcribed Alu loci by computational screening of RNA-Seq data

Affiliations

Identification of RNA polymerase III-transcribed Alu loci by computational screening of RNA-Seq data

Anastasia Conti et al. Nucleic Acids Res. 2015 Jan.

Abstract

Of the ∼ 1.3 million Alu elements in the human genome, only a tiny number are estimated to be active in transcription by RNA polymerase (Pol) III. Tracing the individual loci from which Alu transcripts originate is complicated by their highly repetitive nature. By exploiting RNA-Seq data sets and unique Alu DNA sequences, we devised a bioinformatic pipeline allowing us to identify Pol III-dependent transcripts of individual Alu elements. When applied to ENCODE transcriptomes of seven human cell lines, this search strategy identified ∼ 1300 Alu loci corresponding to detectable transcripts, with ∼ 120 of them expressed in at least three cell lines. In vitro transcription of selected Alus did not reflect their in vivo expression properties, and required the native 5'-flanking region in addition to internal promoter. We also identified a cluster of expressed AluYa5-derived transcription units, juxtaposed to snaR genes on chromosome 19, formed by a promoter-containing left monomer fused to an Alu-unrelated downstream moiety. Autonomous Pol III transcription was also revealed for Alus nested within Pol II-transcribed genes. The ability to investigate Alu transcriptomes at single-locus resolution will facilitate both the identification of novel biologically relevant Alu RNAs and the assessment of Alu expression alteration under pathological conditions.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Architecture of Alu elements considered as RNA polymerase III transcription units. (A) Schematic representation of a typical Alu element, ∼300 bp in length (indicated by graduated bar). Alu transcription by RNA polymerase III requires A box and B box internal promoter elements (orange bars) (6), which form together the binding site for TFIIIC. The consensus sequences for Alu A and B boxes are reported above the scheme. While the Alu B box sequence perfectly matches the canonical B box sequence found in tRNA genes, the sequence of Alu A box slightly diverges from canonical A box sequence (TRGYnnAnnnG; (5)). Transcription is thought to start at the first Alu nt (G) (3,4). The A box starts at position +13, the B box 53 bp downstream, at position +77. The left and right arms of the Alu, each being ancestrally derived from 7SL RNA, are separated from each other by an intermediate A-rich region, starting 35 bp downstream of the B box, whose consensus sequence is A5TACA6. Another A-rich tract is located 3′ to the right arm, at the end of the Alu body, starting at ∼150 bp downstream of the middle A-rich region. Transcription termination by RNA polymerase III is expected to mainly occur at the first encountered termination signal (Tn) downstream of the 3′ terminal A-rich tract. Such a signal, either a run of at least four Ts or a T-rich non-canonical terminator (25), may be located at varying distances from the end of the Alu body, thus allowing for the generation of Alu primary transcripts carrying 3′ trailers of different lengths and sequences. (B) Possible localizations of Alu elements with respect to other transcription units: (i) intergenic/antisense, comprising purely intergenic Alus as well as Alus which are not included in longer transcription units on the same strand, but overlap in antisense orientation to transcription units located on the opposite strand; (ii and iii) gene-hosted, comprising Alus fully contained within introns or UTRs of protein-coding or lincRNA genes in a sense orientation; (iv) all other cases, including Alu RNAs fully or partially mapping to exons, or partially mapping to UTRs, in a sense orientation.
Figure 2.
Figure 2.
Alu RNA identification pipeline. Shown is a flow-diagram of the bioinformatic pipeline for the identification of autonomously expressed Alu loci from RNA-seq data sets. See Results and Materials and Methods for details.
Figure 3.
Figure 3.
Base-resolution expression profiles for six representative Alus of the intergenic/antisense type. Panels A–C and F refer to purely intergenic Alus, panels D and E to two antisense Alus. Shown are the Integrative Genomics Viewer (IGV; http://www.broadinstitute.org/igv/home) visualizations of RNA-seq stranded expression profiles (in bigwig format) around Alu loci in the cell lines indicated either on the left (A–E) or on the right (F) of each panel. r1 and r2 indicate the two independent replicates found in ENCODE data. The orientation and chromosomal coordinates of each Alu, as well as the overlapping (antisense) or nearby RefSeq genes, are indicated in each panel. The dark red bars in panel F indicate regions associated to either TFIIIC (Tf3c1 track) or Pol III (Rpc155 track) in HeLa cells as derived from ENCODE ChIP-seq data.
Figure 4.
Figure 4.
Base-resolution expression profiles for five representative gene-hosted, sense-oriented Alus. Panels A–C refer to Alus hosted within introns of RefSeq genes, panel D to a 3′UTR-hosted Alu, panel E to an Alu hosted within a a lincRNA gene intron. Shown are the IGV visualizations of RNA-seq stranded expression profiles (in bigwig format) around Alu loci in the cell lines indicated either on the left (A–D) or on the right (E) of each panel. r1 and r2 tracks refer to the two independent replicates found in ENCODE data. The orientation and chromosomal coordinates of each Alu, as well as the host RefSeq or lincRNA genes, are indicated in each panel. The dark red bars in panels B and F identify regions associated to the indicated Pol III transcription component (Bdp1, Tf3c1 or Rpc155) in either K562 or HeLa cells as derived from ENCODE ChIP-seq data.
Figure 5.
Figure 5.
Novel AluYa5-derived transcription units associated to snaR clusters. (A) Genome browser visualization of RNA-seq stranded expression profiles of three AluYa5-derived transcription units (Ya5-lm, indicated by red arrows) within the snaR A/C/D cluster on chromosome 19 (41). (B) Transcription unit architecture and sequence of a Ya5-lm repeat (coordinates in parentheses). (C) Sequence alignment of Ya5-lm with Repbase reference sequences for AluYa5 and AluYb8. (D) Genome browser visualizations of RNA-seq stranded expression profiles around the Ya5-lm element represented in panel B, in the cell lines indicated on the left. The dark red bars identify regions associated to Pol III (Rpc155 subunit) in either K562 or HeLa cells as derived from ENCODE ChIP-seq data.
Figure 6.
Figure 6.
In vitro transcription analysis of wild type and B box-mutated Alu loci. In vitro transcription reactions were performed in HeLa nuclear extract using 0.5 μg of the indicated. Alu templates (lanes 5–10, 15–20, 25–30). A previously characterized Alu producing a 372-nt RNA (lanes 2, 12, 22) and a human tRNAVal gene producing a known transcript pattern due to heterogeneous transcription termination (lanes 3, 13, 23) (25) were used as positive controls for in vitro transcription and, at the same time, as a source of RNA size markers. Negative control reactions contained either empty pGEM®-T Easy vector (lanes 1, 11, 21) or no template DNA (no-template control (NTC), lanes 4, 14, 24). For each Alu, both the wild-type and a B box-mutated (Bmut) version were tested.
Figure 7.
Figure 7.
In vitro transcription analysis of upstream deleted Alu loci. In vitro transcription reactions were performed in HeLa nuclear extract using 0.5 μg of the indicated Alu templates (lanes 5–10, 15–20, 25–30). A previously characterized Alu producing a 372-nt RNA (lanes 2, 12, 22) and a human tRNAVal gene producing a known transcript pattern due to heterogeneous transcription termination (lanes 3, 13, 23) (25) were used as positive controls for in vitro transcription and, at the same time, as a source of RNA size markers. Negative control reactions contained either empty pGEM®-T Easy vector (lanes 1, 11, 21) or no template DNA (no-template control (NTC), lanes 4, 14, 24). For each Alu, both the wild-type and a mutant version lacking most of the native 5′-flanking region (5del) were tested. For each of the nine Alus subjected to 5′-flank deletion, the extent of reduction of transcription activity, observed with respect to the corresponding wild-type Alu, is reported below the lanes corresponding to each wt-mutant pair. The values represent the average of two independent transcription experiments that differed by no more than 20% of the mean.

Similar articles

Cited by

References

    1. Deininger P. Alu elements: know the SINEs. Genome Biol. 2011;12:236. - PMC - PubMed
    1. Dieci G., Conti A., Pagano A., Carnevali D. Identification of RNA polymerase III-transcribed genes in eukaryotic genomes. Biochim. Biophys. Acta. 2013;1829:296–305. - PubMed
    1. Elder J.T., Pan J., Duncan C.H., Weissman S.M. Transcriptional analysis of interspersed repetitive polymerase III transcription units in human DNA. Nucleic Acids Res. 1981;9:1171–1189. - PMC - PubMed
    1. Fuhrman S.A., Deininger P.L., LaPorte P., Friedmann T., Geiduschek E.P. Analysis of transcription of the human Alu family ubiquitous repeating element by eukaryotic RNA polymerase III. Nucleic Acids Res. 1981;9:6439–6456. - PMC - PubMed
    1. Orioli A., Pascali C., Pagano A., Teichmann M., Dieci G. RNA polymerase III transcription control elements: themes and variations. Gene. 2012;493:185–194. - PubMed

Publication types