Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar;21(3):456-64.
doi: 10.1101/gr.112656.110. Epub 2010 Nov 24.

High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells

Affiliations

High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells

Alan P Boyle et al. Genome Res. 2011 Mar.

Abstract

Regulation of gene transcription in diverse cell types is determined largely by varied sets of cis-elements where transcription factors bind. Here we demonstrate that data from a single high-throughput DNase I hypersensitivity assay can delineate hundreds of thousands of base-pair resolution in vivo footprints in human cells that precisely mark individual transcription factor-DNA interactions. These annotations provide a unique resource for the investigation of cis-regulatory elements. We find that footprints for specific transcription factors correlate with ChIP-seq enrichment and can accurately identify functional versus nonfunctional transcription factor motifs. We also find that footprints reveal a unique evolutionary conservation pattern that differentiates functional footprinted bases from surrounding DNA. Finally, detailed analysis of CTCF footprints suggests multiple modes of binding and a novel DNA binding motif upstream of the primary binding site.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
DNase-seq identifies protein–DNA footprints. All potential CTCF binding sites were identified genome-wide using motif matching and compiled such that their 5′ end was set at position zero. Cumulative DNase-seq and CTCF ChIP-seq signals within 500 bp of each site in both directions were determined. (A) CTCF motifs that have a DNase I footprint (red) also display high CTCF ChIP-seq signal (green). (B) CTCF motifs that have no footprint have greatly reduced CTCF ChIP-seq signal. (C) Footprinting using DNase-seq accurately identified footprints within the FMR1 promoter region previously mapped using traditional in vitro DMS footprinting. Dips in raw DNase-seq signal and annotated footprints correspond perfectly with previously identified footprints (gray boxes) (Drouin et al. 1997). The phastCons annotation shows increased levels of evolutionary conservation within called footprints. (D) A representative individual region displaying a DNase I footprint matching a known CTCF binding motif (gray box) with a strong corresponding CTCF ChIP-seq signal. See also Supplemental Figure 1A.
Figure 2.
Figure 2.
Accuracy of footprinting model. Positive predictive value (PPV) was calculated for predictions of four factors: CTCF, REST, GABP, and SRF. True-positives were determined by ChIP-seq peaks with a matching motif, while true-negatives were determined by motifs without corresponding ChIP-seq peaks. PPV is shown for predictions using only PWMs (all PWMs are considered an actual binding site), PWMs that map within DHS sites (all PWMs within a DNase I hypersensitive site are considered actual binding sites, while those PWMs outside of DNase I hypersensitive sites are considered negatives), and PWMs that map within footprints (all PWMs within a footprint are considered actual binding sites while those PWMs outside of footprints are considered negatives). The total number of PWMs mapped to the genome for each factor is listed in parentheses.
Figure 3.
Figure 3.
Identification of cell type–specific footprints. Cumulative DNase-seq footprinting signals were determined across seven different cell lines for REST (A), TLX1-NFIC (B), and IRF2 (C). For each factor, the same set of motifs was used for all seven cell types. DNase-seq read counts were calculated in the regions surrounding these motifs, similar to Figure 1. Regions shaded in gray represent cell types that display reduced footprinting signal. HUVEC IRF2 shows moderate footprinting signal (light gray). Note that for REST, all cell lines display consistent signals.
Figure 4.
Figure 4.
Conservation of sequence in and around DNase I footprints. (A) In general, footprints contain a strong sequence conservation signal with a nearby “shoulder” of conservation around all footprinted regions (black). Between the conservation peak and shoulder is a region with a marked decrease in conservation. This conservation pattern is not detected when the signal is centered on DNase I hypersensitive sites (red). The average conservation signal across the genome is shown in green. (B) The conservation pattern for a single factor, NFYA, displays the characteristic drop in conservation surrounding the footprint. (C) This conservation pattern is not detected around CTCF footprints, which shows relatively little conservation outside the highly conserved footprint. See also Supplemental Figure 6.
Figure 5.
Figure 5.
High-resolution analysis of CTCF binding sites. (A) Cumulative footprinting signal at all CTCF motif predicted sites that includes sites with and without a large increase in DNase I digestion upstream of the CTCF motif. The light gray bar indicates the location of the known CTCF motif. The dark gray bar represents the location of a novel binding motif. The novel binding motif was only detected in CTCF footprints that contain the small upstream region with a spike in DNase I hypersensitivity (HS). Note that the entire protected region is approximately 50–60 bases. (B) Strand-specific DNase-seq signal for the subset of CTCF motif identified sites that contain the upstream DNase I HS spike. The DNase I HS spike is only detected on the positive strand. (C) Similarly for the CTCF motif identified sites without the upstream DNase I HS spike. The diagram below each plot in B and C illustrates the estimated strand-specific protected regions surrounding the CTCF motif predicted sites.

Similar articles

Cited by

References

    1. Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, et al. 2009. Diversity and complexity in DNA recognition by transcription factors. Science 324: 1720–1723 - PMC - PubMed
    1. Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, Rubin EM 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391–1394 - PubMed
    1. Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE 2008a. High-resolution mapping and characterization of open chromatin across the genome. Cell 132: 311–322 - PMC - PubMed
    1. Boyle AP, Guinney J, Crawford GE, Furey TS 2008b. F-Seq: A feature density estimator for high-throughput sequence tags. Bioinformatics 24: 2537–2538 - PMC - PubMed
    1. Bryne JC, Valen E, Tang ME, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A 2008. JASPAR, the open access database of transcription factor-binding profiles: New content and tools in the 2008 update. Nucleic Acids Res 36: D102–D106 - PMC - PubMed

Publication types

Associated data