Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 6;45(15):e145.
doi: 10.1093/nar/gkx594.

Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments

Affiliations

Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments

Rene Welch et al. Nucleic Acids Res. .

Abstract

ChIP-exo/nexus experiments rely on innovative modifications of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites. Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq, these high throughput experiments pose a number of unique quality control and analysis challenges. We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package, ChIPexoQual, to enable exploration and analysis of ChIP-exo and related experiments. ChIPexoQual evaluates a number of key issues including strand imbalance, library complexity, and signal enrichment of data. Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage. We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple, new ChIP-exo datasets from Escherichia coli. ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
ChIP-seq versus ChIP-exo/nexus. (A) Processing of sonicated fragments bound by TF before immunoprecipitation and PCR amplification: For ChIP-exo, an exonuclease enzyme (orange hexagon) trims the 5′ ends of each DNA fragment to a fixed distance from the TF. For ChIP-nexus, a random barcode is added on the 3′ end, and transferred to the 5′ stopping base by self-circularization. For both ChIP-exo and SE ChIP-seq, an adaptor is ligated (green triangles) at the 5′ ends. The adaptors are ligated to both ends for PE ChIP-seq. (B) Forward Strand Ratio densities for SE ChIP-seq and ChIP-exo peaks. (C) Hexbin plot of PE ChIP-seq bin counts vs. ChIP-exo bin counts. (D) Mappability score vs. mean ChIP-exo read counts with error bands. E) GC-content vs. mean ChIP-exo read counts with error bands. (F) SCC curves for human CTCF from HeLa cell lines. The SCC curve for the ChIP-exo sample from (1) is shown in the left panel, and the SCC for ChIP-seq samples from (17) are shown in the right panel. The ChIP-exo curve shows local maxima at the motif and read lengths. SE ChIP-seq curves for both replicates are maximized at the fragment length and show local maxima at the read length.
Figure 2.
Figure 2.
ChIP-exo QC pipeline ChIPexoQual. The ChIP-exo reads are partitioned into overlapping clusters of reads separated by gaps (step 1). For each region, the following summary statistics are calculated (step 2) and visualized (step 3): Average Read Coefficient (ARC), Unique Read Coefficient (URC) and Forward Strand Ratio (FSR). These statistics are visualized as: (A) URC versus ARC plots, which presents the overall balance between library complexity and enrichment; there are two arms, one with low ARC and varying URC and one where the URC decreases as the ARC increases. (B) Region Composition plot, which shows the strand composition for all the regions formed by a minimum number of reads. (C) FSR distribution plot, which illustrates the FSR’s distribution as the depths of the islands get larger. D) Example of the Blacklisted region analysis module. Both formula image and formula image scores are significantly higher for islands overlapping the blacklisted regions, and robust to the removal of them.
Figure 3.
Figure 3.
ChIPexoQual diagnostic plots for the FoxA1 ChIP-exo data (20). (A) URC versus ARC plot, (B) Region Composition plot, (C) FSR distribution plot comparison across three replicates and (D) β1 and β2 scores stratified based on overlap with the blacklisted regions.
Figure 4.
Figure 4.
Validation of the ChIPexoQual pipeline with FoxA1 ChIP-exo (A–C) and TBP ChIP-exo/nexus (D, E) data. (A) Comparison of the top 50, 100, 250, 500, 1000 and 2000 FIMO scores for each replicate. (B) FoxA1 average coverage plots of the 5′ read ends centered around motif start positions separated by replicate and strand. (C) FoxA1 FSR distribution of ChIPexoQual islands overlapping ChIP-exo peaks stratified by the number of motifs. (D) Comparison of the top 50, 100, 250, 500, 1000, 2000, 4000 and 8000 FIMO scores for each TBP ChIP-exo/nexus sample. (E) TBP FSR distribution of ChIPexoQual islands overlapping ChIP-exo/nexus peaks stratified by the number of motifs.
Figure 5.
Figure 5.
Comparison of ChIPexoQual numerical summaries. (A) formula image and (B) formula image for all eukaryotic ChIP-exo/nexus samples. (C) Average estimated β1 and (D) β2 for the ChIP-exo/nexus TBP samples in K562 cell lines when sub-sampling 20M to 50M reads.

Similar articles

Cited by

References

    1. Rhee H.S., Pugh F.. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell. 2011; 147:1408–1419. - PMC - PubMed
    1. He Q., Johnston J., Zeitlinger J.. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat. Biotechnol. 2014; 33:395–401. - PMC - PubMed
    1. Kasinathan S., Orsi G.A., Zentner G.E., Ahmad K., Henikoff S.. High-resolution mapping of transcription factor binding sites on native chromatin. Nat. Methods. 2014; 11:203–209. - PMC - PubMed
    1. Skene P.J., Henikoff S.. A simple method for generating high-resolution maps of genome-wide proteing binding. eLIFE. 2015; e09225:1–9. - PMC - PubMed
    1. Mahony S., Franklin P.B.. Protein-DNA binding in high-resolution. Crit. Rev. Biochem. Mol. Biol. 2015; 50:269–283. - PMC - PubMed

Publication types

MeSH terms