Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 20;148(1-2):335-48.
doi: 10.1016/j.cell.2011.11.058. Epub 2012 Jan 12.

Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages

Affiliations

Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages

Dominic Schmidt et al. Cell. .

Erratum in

  • Cell. 2012 Feb 17;148(4):832

Abstract

CTCF-binding locations represent regulatory sequences that are highly constrained over the course of evolution. To gain insight into how these DNA elements are conserved and spread through the genome, we defined the full spectrum of CTCF-binding sites, including a 33/34-mer motif, and identified over five thousand highly conserved, robust, and tissue-independent CTCF-binding locations by comparing ChIP-seq data from six mammals. Our data indicate that activation of retroelements has produced species-specific expansions of CTCF binding in rodents, dogs, and opossum, which often functionally serve as chromatin and transcriptional insulators. We discovered fossilized repeat elements flanking deeply conserved CTCF-binding regions, indicating that similar retrotransposon expansions occurred hundreds of millions of years ago. Repeat-driven dispersal of CTCF binding is a fundamental, ancient, and still highly active mechanism of genome evolution in mammalian lineages.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
CTCF Occupancy in Five Placental Mammalian Genomes Reveals a Large Core Set of Conserved Binding (A) The total numbers of CTCF-binding events found in orthologous locations between each pair of placental species are shown as row-column intersections. The right-most numbers for each species represent all alignable CTCF-binding peaks (total peaks are in parentheses). Percentages are percentage-averages between pairwise species (Experimental Procedures). (B) Five-way comparison of CTCF binding in five placental mammals identified a shared set of 5,178 CTCF-binding events. (C) The upper track shows CTCF binding after CTCF knockdown (CTCF) in human MCF-7 cells (Figure S1F). The track immediately below shows CTCF binding with control RNAi (mock). The bottom five tracks show CTCF-binding data in liver of five mammalian species in syntenic regions, demonstrating that highly conserved CTCF-binding events are less sensitive to perturbation by RNAi knockdown. (D) The fraction of binding events found only in human (human only) or shared among all placental (five-way) were characterized by their sensitivity to RNAi knockdown of CTCF protein. Very few deeply shared CTCF-binding events were affected by CTCF knockdown. (E) Relation between motif information content and motif sequence conservation for nine TFs in human. (F) Relation between motif length and motif sequence conservation for the same TFs as in (E). See also Figure S1 and Table S2.
Figure 2
Figure 2
CTCF Binding Often Occurs at a Highly Conserved Motif, Consisting of a Two-Part Profile (A) Motifs (M1 and M2) identified de novo from CTCF-binding events. (B) Binding event counts and number of binding events with at least one motif (M1 and M1+M2) in all six species. M1+M2 20,21 represents the preferred spacing patterns of these two submotifs. (C) The DNA sequence constraint around the CTCF motif in human was plotted by observed/expected genomic evolutionary rate profiling (red, GERP) scores (Cooper et al., 2005). The frequencies of unchanged bases in five-way shared CTCF-binding events are shown as position weight matrix (PWM) below the GERP profile. (D) Peaks containing the M2 motif in preferred spacing are stronger in ChIP enrichment both by read count and peak width, are more highly shared among mammals, and are resistant to RNAi-mediated knockdown. (E) A multiple mammalian sequence alignment of a CTCF peak at the APP gene is shown. The DNase I footprint (red box, Quitschke et al., 2000) encompasses a complete 34 bp M1 and M2 CTCF motif. (F) DNA sequence of the human c-myc promoter (Human c-myc Fragment A) bound by CTCF in vivo and in vitro (Filippova et al., 1996). The sequence contains the canonical M1 CTCF motif (red) and the M2 motif (blue). A 3 bp mutation in the M2 motif that eliminates CTCF binding in vitro is indicated in green. See also Figure S2.
Figure 3
Figure 3
CTCF Motif Usage Shows a Conserved Hierarchy among Placental Mammals Heat map of the 2,492 CTCF motif-words found at least five times in any species anchored to human; words are normalized by their background occurrences within each genome. This set of words is found in 27,543 human-binding events. The data are sorted in the human column by decreasing frequency, and spearman rank correlations after one-dimensional hierarchical clustering of the rows are shown. The average ChIP-enrichment of the motif-words separated into bins containing 100 words is shown as a bar chart (left). Similarly, the fraction of five-way conserved CTCF-binding events within the same bins are shown as a bar chart (right). See also Figure S3.
Figure 4
Figure 4
Repeat Expansions Remodeled CTCF Binding in Three Mammalian Lineages (A) Heatmap of 71 motif-words identified as highly enriched in mammalian lineages. (B) Lineage-specific repeats that are associated with the lineage-specific motif-words. (C) Venn diagram showing the number of B2 repeat-associated binding events shared between mouse and rat. (D) Frequencies of distances between the centers of M1 and M2 in all six studied species. There is a smaller spacing between M1 and M2 in mouse and rat (blue arrow), due to the B2 repeat expansion. (E) Sections of the aligned consensus sequences from CTCF-carrying retrotransposons in mouse, rat, dog, and opossum; rat and mouse contain the M1+M2 motif, dog and opossum only contain M1. Consensus motifs for CTCF binding solely based on bound repeat instances are shown below each alignment. (F) Estimated ages of lineage-specific repeats that expanded CTCF binding. White box plots are all instances of the indicated repeat; red box plots are only those bound by CTCF. See also Figure S4.
Figure 5
Figure 5
Intermittent Repeat Expansions Can Lead to Conserved, Lineage-Specific, and Species-Specific CTCF Binding in Mammals A CTCF-binding site found within an ancient transposon shows conserved binding in placental and nonplacental mammals (left data inset) and must have been present in the mammalian ancestor (ur-Mammal). In contrast, a CTCF-binding site generated in the eutherian ancestor (ur-Placental) shows conserved binding across placental mammals but is absent in marsupials (right data inset). More recent CTCF-binding expansions lead to increasingly lineage- and species-specific CTCF binding. For example, the expansions of B2 repeats in the mouse and rat ancestor (ur-Rodent) created CTCF binding that is highly shared between mouse and rat, whereas the continued B2 expansions along both lineages also generated species-specific CTCF-binding sites (see Figure 4C). See also Table S3.
Figure 6
Figure 6
Chromatin Boundaries Separated by Repeat-Associated CTCF Binding in Rodents (A) A B2-associated CTCF-binding event separates the ApoA cluster from downstream genes on mouse chromosome 9 (top blue track). Active transcription is reflected both by H2AK5ac occupancy in mouse liver (bottom green track) and in direct sequencing of mouse liver mRNA by gene name shading (red is silent; green is active) (Mortazavi et al., 2008). (B) Heat map representation of H2AK5ac chromatin domains flanked by CTCF binding that is shared between all five species (five-way), mouse unique and repeat-associated (mouse RABs), repeat-associated and shared between mouse and rat (mouse and rat shared RABs), and not within the previous categories (all other). (C) Violin plots represent gene expression differences (Manhattan distances) between H2AK5ac and CTCF defined chromatin domains for different gene pair categories. See also Figure S6.
Figure 7
Figure 7
Tandem Gene Pairs Separated by CTCF Differ More in Their Expression than Gene Pairs that Are Not Separated by CTCF (A) Exemplified tandem gene pairs that are separated by CTCF binding or not separated by CTCF (no). The CTCF-separated tandem gene pairs are further distinguished into the following three groups: (1) shared between the five mammals shown in (B) (five-way shared), (2) associated with lineage-specific repeats (repeat-associated, RAB), (3) all other CTCF-separated gene pairs (all other). (B) Violin plots represent gene expression difference distributions (Manhattan distance) per tandem gene pair group as explained in (A). Stars () indicate p values compared to the no CTCF binding category that are smaller than 0.001 (wilcoxon rank-sum test).
Figure S1
Figure S1
CTCF Binding Is Primarily Directed by Genetic Sequence and Is Highly Conserved; Western Blot Confirmation of CTCF RNAi and Tissue Specificity of Conserved CTCF Binding, Related to Figure 1 (A) In the first column (Hs-chr21), ten kilobase windows around human CTCF-binding events were ordered, based whether a syntenic CTCF-binding event is present or not in mouse liver. In the second and third columns, CTCF binding in the Tc1 mouse has been shown for the human chromosome 21 (Tc1-Hs-chr21) and for the orthologous mouse sequences (Tc1-Mm-chr16, 17, 10). Most CTCF binding found on human chromosome 21 in human liver is recapitulated in the mouse liver. (B) Genome tracks displaying the CTCF binding found near the liver-expressed gene CLDN14 in human (red, Hs-chr21) and Tc1 mouse (blue, Tc1-Hs-chr21; green, Tc1-Mm-chr16). (C) Genomic occupancy of HNF4A, CEBPA (orange tracks), and CTCF (blue tracks) is shown around the liver gene APOA2 in human, mouse, and dog. Grey lines connect orthologous regions between species. (D) Binding events for CEBPA, HNF4A, and CTCF have been sorted, based on whether they occur in one, two, or three of the placental species from (C). (E) The fraction of binding events found only in human (human only) or shared among all placental mammals (five-way) were characterized by their tissue specificity. Few deeply shared CTCF-binding events are tissue specific in humans. (F) Western blot of nuclear extracts after CTCF RNAi, mock RNAi, and non-transfected (NT) human MCF-7 cells. (G) Read profiles of CTCF binding after CTCF (red lines) or mock RNAi (black lines) in MCF-7 cells. CTCF binding was separated into two groups: (1) human-specific binding events in liver that are also found in MCF-7 cells (human only binding events) and (2) five-way shared binding events in liver that overlap with CTCF binding in MCF-7 cells (five-way shared binding events). (H) The total numbers of CTCF-binding events (CTCF-bound regions) for the following data sets are shown: MCF-7 after mock RNAi (MCF-7), MCF-7 after CTCF RNAi (MCF-7 KD), human liver (Liver). The bottom two rows show the CTCF binding overlaps between MCF-7 versus Liver and MCF-7 KD versus Liver binding. Total CTCF-binding overlaps are indicated on the left and further split into three categories: five-way shared, human-specific, and all other CTCF binding.
Figure S2
Figure S2
CTCF Motifs (M1 and M2) and Motif Occurrences, Related to Figure 2 (A) Motifs identified de novo from CTCF-binding events in all six species. (B) Different properties of CTCF-binding events dependent on the presence of M2. (C) Read profile at CTCF-binding events where only the M1 motif (black line) or the complete two-part motif consisting of M1 and M2 was detected (red line). (D) Binding event counts and number of binding events with at least one motif (M1 and M1+M2) in all six species. (E) Presence and absence of M1 and M2 in two DNA sequences from Filippova et al. (1996). The motif score (nmscan uses bits-suboptimal scoring with 0.0 being a perfect match) is indicated under each motif instance. (F) A multiple mammalian sequence alignment of a CTCF peak at the APP gene is shown. The DNase I footprint (red box, Quitschke et al., 2000) encompasses a complete 34 bp M1 and M2 CTCF motif.
Figure S3
Figure S3
Motif-Word Analysis for HNF4A, Related to Figure 3 (A) Spearman correlations of HNF4A and CTCF motif-word usage between the indicated species pairs. (B) Heat map of 3,981 HNF4A motif-words found at least five times in any species; words are normalized by their background occurrences within each genome. This set of words is found in 17,661 human binding events. The data are sorted in the human column by decreasing frequency. The average ChIP-enrichment of the HNF4A motif words separated into bins containing 100 words is shown as a bar chart (left). Similarly, the fraction of three-way conserved HNF4A binding events within the same bins are shown as a bar chart (right).
Figure S4
Figure S4
CTCF Directly Binds Specific Repeat Elements, and Mouse and Rat Share Many Bound B2 Repeat Instances, Related to Figure 4 (A) Aggregate read-profiles of repeat driven CTCF-binding events in four mammals. The bars under the graphs show the density of the indicated repeats. (B) The fraction of CTCF-binding events due to B2 repeats and all other binding events in mouse (left) and rat (right) are separated into different conservation groups. A “1” indicates binding, and “0” indicates no binding in the relevant species. For example binding events that are only shared between mouse and rat are depicted as “00110” and also highlighted in red. More than half of the B2 repeats bound by CTCF in rat are also bound in mouse, indicating that the SINE transposon acquired CTCF binding in a common ancestor of rat and mouse. (C) Venn diagram showing the number of B2 repeats associated binding events in the alignable genome shared in mouse and rat. (D) Estimated ages of lineage-specific repeats that expanded CTCF binding. The white box plots are based on all instances of the indicated repeat; the red box plots are only based on repeat instances that are bound by CTCF. (E) Fraction of different CTCF-binding event categories associated with mouse (Mouse) or rodent (Rodents) specific genes.
Figure S5
Figure S5
Custom Opossum CTCF Antibody Design and Validation, Related to Experimental Procedures (A) Alignment of parts of CTCF's protein sequences in multiple mammals. The peptides used to generate the commercial (human, rhesus, mouse, rat, dog) and custom (opossum) antibodies are highlighted. (B) Wiggle tracks of CTCF and cohesin (STAG1/SA1) binding in opossum liver around the APP1 gene. The binding event highlighted with a star is in the orthologous location of the human binding event used for DNase I footprinting (Quitschke et al., 2000). (C) Violin plots of raw read counts in opossum CTCF binding events for both replicates and cohesin validating that most opossum CTCF-binding events show strong cohesin enrichment. (D) Scatter and Bland-Altman plots comparing the opossum CTCF to the opossum cohesin replicates. Spearman correlations are indicated.
Figure S6
Figure S6
Colocalization of TFs with Chromatin Barriers, Related to Figure 5 The fractions of regions bound by CTCF in mouse liver as well as Oct4 and Nanog in mouse ESCs (Marson et al., 2008) that are found to be at mouse liver H2AK5ac domain boundaries are shown. Open circles indicate Oct4 and Nanog binding events that are more than 1 kb away from a CTCF-binding event. As random controls we shifted CTCF binding randomly (Shifted CTCF) and selected a set of random genomic regions (Random regions).

Similar articles

Cited by

References

    1. Awad T.A., Bigler J., Ulmer J.E., Hu Y.J., Moore J.M., Lutz M., Neiman P.E., Collins S.J., Renkawitz R., Lobanenkov V.V., Filippova G.N. Negative transcriptional regulation mediated by thyroid hormone response element 144 requires binding of the multivalent factor CTCF to a novel target DNA sequence. J. Biol. Chem. 1999;274:27092–27098. - PubMed
    1. Baniahmad A., Steiner C., Köhne A.C., Renkawitz R. Modular structure of a chicken lysozyme silencer: involvement of an unusual thyroid hormone receptor binding site. Cell. 1990;61:505–514. - PubMed
    1. Bejerano G., Lowe C.B., Ahituv N., King B., Siepel A., Salama S.R., Rubin E.M., Kent W.J., Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006;441:87–90. - PubMed
    1. Bell A.C., West A.G., Felsenfeld G. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell. 1999;98:387–396. - PubMed
    1. Blewitt M.E., Vickaryous N.K., Paldi A., Koseki H., Whitelaw E. Dynamic reprogramming of DNA methylation at an epigenetically sensitive allele in mice. PLoS Genet. 2006;2:e49. - PMC - PubMed

Supplemental References

    1. Boyle, A.P., Song, L., Lee, B.-K., London, D., Keefe, D., Birney, E., Iyer, V.R., Crawford, G.E., and Furey, T.S. (2011). High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464. - PMC - PubMed
    1. Cooper, G.M., Stone, E.A., Asimenos, G., Green, E.D., Batzoglou, S., Sidow, A.; NISC Comparative Sequencing Program. (2005). Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913. - PMC - PubMed
    1. Cuddapah, S., Jothi, R., Schones, D.E., Roh, T.-Y., Cui, K., and Zhao, K. (2009). Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res. 19, 24–32. - PMC - PubMed
    1. ENCODE Project Consortium. (2011). A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046. - PMC - PubMed
    1. Filippova, G.N., Fagerlie, S., Klenova, E.M., Myers, C., Dehner, Y., Goodwin, G., Neiman, P.E., Collins, S.J., and Lobanenkov, V.V. (1996). An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Mol. Cell. Biol. 16, 2802–2813. - PMC - PubMed

Publication types