Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 28;50(4):2111-2127.
doi: 10.1093/nar/gkac088.

Endogenous retroviruses co-opted as divergently transcribed regulatory elements shape the regulatory landscape of embryonic stem cells

Affiliations

Endogenous retroviruses co-opted as divergently transcribed regulatory elements shape the regulatory landscape of embryonic stem cells

Stylianos Bakoulis et al. Nucleic Acids Res. .

Abstract

Transposable elements are an abundant source of transcription factor binding sites, and favorable genomic integration may lead to their recruitment by the host genome for gene regulatory functions. However, it is unclear how frequent co-option of transposable elements as regulatory elements is, to which regulatory programs they contribute and how they compare to regulatory elements devoid of transposable elements. Here, we report a transcription initiation-centric, in-depth characterization of the transposon-derived regulatory landscape of mouse embryonic stem cells. We demonstrate that a substantial number of transposable element insertions, in particular endogenous retroviral elements, are associated with open chromatin regions that are divergently transcribed into unstable RNAs in a cell-type specific manner, and that these elements contribute to a sizable proportion of active enhancers and gene promoters. We further show that transposon subfamilies contribute differently and distinctly to the pluripotency regulatory program through their repertoires of transcription factor binding site sequences, shedding light on the formation of regulatory programs and the origins of regulatory elements.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
TEs are divergently transcribed into unstable RNAs. (A) Average distribution of CAGE-inferred TSS locations (vertical axis; expression agnostic) ±500 bp upstream/downstream and across the body of major TE families (horizontal axis). TSS locations are visualized separately for the sense (upper panel) and antisense (middle panel) strands. (B) Average distribution of CAGE-inferred TSS locations for the ERVK family. (C) Transcriptional directionality score, describing the strand bias in expression levels (ranges between -1 for 100% minus strand expression and 1 for 100% plus strand expression), for mRNA and non-mRNA (non-protein-coding GENCODE transcripts) as well as TE-associated and non-TE-associated RNAs (regardless of annotation). (D) Exosome sensitivity, measuring the relative amount of exosome degraded RNAs (ranges between 0 for RNAs unaffected by the exosome and 1 for 100% unstable RNAs), for transcripts associated with LTR families ERV1, ERVK, ERVL, and ERVL-MaLR. For comparison, exosome sensitivity is shown for mRNAs and gene-distal loci. (E and F) Genome browser tracks for two loci of unannotated transcripts with characteristic divergent expression patterns falling on TE insertions of ORR1A2 (ERVL-MaLR; (E)) and RMER17B (ERVK; (F)) subfamilies. Pooled replicate CAGE expression levels in control (Scr) and after exosome depletion (Rrp40) split by plus (blue) and minus (red) strands are shown. For visibility reasons, the scales of CAGE signals differ between strands and conditions.
Figure 2.
Figure 2.
Transcription of transposable elements reveals co-opted regulatory elements. (A) Fraction of expressed TEs in control (Scr) and exosome KD (Rrp40) mESCs as well as all TEs annotated in Repeatmasker (expression agnostic) per genomic annotation group for each TE family. The number of instances of each TE family is shown in parenthesis. (B) Percentages of orthologous TE-associated, ERV-associated, and non-TE-associated expressed DHSs, as well as background regions between mouse and rat or human genomes. (C) The number of transcribed TEs overlapping FANTOM 5 mouse enhancers at the TE family level. TE subfamily counts are displayed in Supplementary Figure S9A. (D) The number of transcribed TEs overlapping STARR-seq mESC enhancers at the TE family level. TE subfamily counts are displayed in Supplementary Figure S9B.
Figure 3.
Figure 3.
TE insertions co-opted as divergently transcribed enhancers. (A and B) Genome browser tracks for two intergenic loci showing TPM-normalized CAGE data pooled across replicates and split by plus (blue) and minus (red) strands. Shown are also the locations of FANTOM5 mouse enhancers and STARR-seq mESC enhancers and signal tracks for ENCODE DNase-seq data and H3K4me1 and H3K27ac ChIP-seq data for E14 mESCs. The CAGE signals identify divergent transcription initiation from ERV1 RLTR41 insertions, alone (A) and in pairs (B) with TE insertions of RMER10B and MYSERV-int.
Figure 4.
Figure 4.
Transcribed TEs exhibit chromatin features of regulatory elements. (A) Hierarchical clustering of histone modification (ChIP-seq) signals ±2000 bp around the summits of TE-associated clusters of CAGE-inferred TSSs (CAGE tag clusters, Materials and Methods). The ChIP-seq signal is shown as fold-change over input control. Clusters are represented in rows (color coded in left legend) and histone modifications in columns. Average distributions of ChIP-seq signals for each cluster are shown (top panel). (B) Annotations of TE-associated CAGE tag clusters based on GENCODE and RepeatMasker TE classes and families. (C and D) Bar plots of odds ratios of enrichments (Fisher’s exact test) of TE classes (C) and TE families (D) in each cluster.
Figure 5.
Figure 5.
ERV subfamilies contribute to distinct enrichments of binding sites for pluripotency factors. (A) Motif enrichments for selected TFs (columns) in transcribed TEs across selected ERV subfamilies (rows) versus a background of non-TE genomic regions ±200 bp around the summits of all CAGE-inferred TSS clusters. White cells indicate no enrichment and cases of complete depletion were assigned the lowest detected score, represented in dark blue according to the scale. The full TF enrichment heatmap is shown in Supplementary Figure S15A. Example genomic insertion sites for ERVK subfamilies are shown to illustrate their differences in carrying putative TF binding sites for Sox2, Esrrb and Oct4. (B) Correlations (Pearson’s r) between TF motif enrichments across all ERV subfamilies (as shown in Supplementary Figure S15A). The full heatmap of correlations is given in Supplementary Figure S15B. (C) Similarity (q-value) between binding motifs for selected TFs. Sequence logos (right) for Oct4, Nanog and Klf4 exemplify differences and similarities of TF binding sites.
Figure 6.
Figure 6.
ERV subfamilies contribute to distinct gene regulatory programs. (A) GO term enrichment for putative target genes (ABC) of gene distal ERVs split by ERV subfamily (foreground) versus all ABC-predicted target genes (background). For ease of visualization, gene ontology terms are colored by manually curated process or function, and the underlying gene ontology term enrichments are shown for RLTR13B1 and MLTR14. Full results are provided in Supplementary Figure S16. (B) Predicted enhancer interactions with the promoter of gene Taf2 (TATA-Box Binding Protein Associated Factor 2). The four enhancers are marked by gray boxes and a zoom-in is provided below, showing overlaps with ERV insertions of RMER10B, MLTR14 and RMER17C. Tracks for GENCODE (M19) transcripts, FANTOM5 enhancers, STARR-seq enhancers, and TEs provided by RepeatMasker are shown. (C) Expression fold-change (log2), as measured by CAGE, in mESCs versus EBs at ERV-associated DHSs that carry (right) or not (left) predicted binding sites for pluripotency factors. (D) Gene-level expression (TPM normalized), as measured by CAGE, quantified for ABC-linked genes of transcribed TE-associated DHSs that carry predicted binding sites for pluripotency factors.

Similar articles

Cited by

References

    1. Haberle V., Stark A.. Eukaryotic core promoters and the functional basis of transcription initiation. Nat. Rev. Mol. Cell Biol. 2018; 19:621–637. - PMC - PubMed
    1. Beagrie R.A., Pombo A.. Gene activation by metazoan enhancers: diverse mechanisms stimulate distinct steps of transcription. Bioessays. 2016; 38:881–893. - PubMed
    1. Shlyueva D., Stampfel G., Stark A.. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 2014; 15:272–286. - PubMed
    1. Andersson R., Sandelin A.. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 2020; 21:71–87. - PubMed
    1. Nguyen T.A., Jones R.D., Snavely A.R., Pfenning A.R., Kirchner R., Hemberg M., Gray J.M.. High-throughput functional comparison of promoter and enhancer activities. Genome Res. 2016; 26:1023–1033. - PMC - PubMed

Publication types

MeSH terms

Substances