Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 10:5:75.
doi: 10.3389/fgene.2014.00075. eCollection 2014.

Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data

Affiliations

Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data

Thomas S Carroll et al. Front Genet. .

Abstract

With the advent of ChIP-seq multiplexing technologies and the subsequent increase in ChIP-seq throughput, the development of working standards for the quality assessment of ChIP-seq studies has received significant attention. The ENCODE consortium's large scale analysis of transcription factor binding and epigenetic marks as well as concordant work on ChIP-seq by other laboratories has established a new generation of ChIP-seq quality control measures. The use of these metrics alongside common processing steps has however not been evaluated. In this study, we investigate the effects of blacklisting and removal of duplicated reads on established metrics of ChIP-seq quality and show that the interpretation of these metrics is highly dependent on the ChIP-seq preprocessing steps applied. Further to this we perform the first investigation of the use of these metrics for ChIP-exo data and make recommendations for the adaptation of the NSC statistic to allow for the assessment of ChIP-exo efficiency.

Keywords: ChIP-exo; ChIP-seq; QC; blacklist; duplicates.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) The venn-diagram represents the genomic overlap between DAC consensus, UHS, and DER blacklists. Pie charts show the proportions of blacklist classes contained within overlapping and unique regions of the DAC consensus, UHS, and DER blacklists. (B) Bar charts show the relative enrichment of blacklist classes unique to either DER and UHS blacklist regions.
Figure 2
Figure 2
(A) The boxplots show the percentage of total reads for all (red), duplicated (blue), and multi-mapped reads (orange) within the DAC consensus, UHS, and DER blacklists for ENCODE/SYDH datasets. (B) The boxplots show the percentage of total reads for all (red), duplicated (blue), and multi-mapped reads (orange) within the DAC consensus for ENCODE/SYDH and CRUK datasets. (C) Boxplots illustrating the range of RPKM within blacklist classes for DER only, DAC consensus not within UHS and the overlapping DAC consensus and UHS regions.
Figure 3
Figure 3
(A) The Boxplots show the range of SDD values for CRUK and ENCODE/SYDH input samples with no filtering steps applied and after filtering of signal from DAC consensus, UHS, and DER blacklists. (B) Boxplots of the SSD scores for input, transcription factors (TFs) and histone marks from ENCODE and CRUK datasets following blacklisting by the DAC consensus regions.
Figure 4
Figure 4
IGV screenshot of an example CTCF ChIP signal showing the distribution of Watson and Crick signal around the CTCF motif and the distribution of Watson and Crick signal following extension of reads to the expected fragment length.
Figure 5
Figure 5
An example and illustration of the assessment of cross-correlation following shifting of the reads on the Watson strand. The cross-correlation of the CTCF ChIP sample (SRR568129) shows the dominance of the fragment-length cross correlation peak over the read-length cross correlation peak. The c-Myc ChIP sample (SRR568130) in contrast shows greater cross-correlation at the read-length peak than at the expected fragment length highlighting potential problems in fragment length prediction for that sample.
Figure 6
Figure 6
Scatterplots show the fragment lengths predicted by cross-correlation analysis for transcription factor datasets from the ENCODE/SYDH set, with no filtering and following blacklisting by the DAC consensus, UHS, and DER blacklists.
Figure 7
Figure 7
(A,B) Example cross-correlation profiles for a c-Myc (A) and a CTCF (B) sample (SRR568130 and SRR568129, respectively). Cross-correlation profiles after no filtering, filtering of duplicated reads, exclusion of DAC consensus blacklist and simultaneous blacklisting and duplicate removal. (C) Cross correlation profiles of reads in DAC blacklisted regions, reads in peaks and duplicated reads for an example ER ChIP-seq sample (ERR336952). (D) Cross correlation profiles for duplicated reads inside and outside of peaks for an example ER ChIP-seq sample (ERR336952).
Figure 8
Figure 8
(A,B) Boxplots of the RSC scores for TF datasets from the ENCODE/SYDH (A) and from CRUK (B) sets after differing filtering steps. For the CRUK set only the DAC consensus set was used to evaluate the effect of blacklisting given its observed greater enrichment for artifact signal over the DER/UHS sets within ENCODE data. (C,D) Boxplots of the change in cross-correlation signal at the fragment length (fragment strand cross-correlation; FSC) for TF datasets from the ENCODE/SYDH (C) and from CRUK (D) sets following the removal of blacklisted regions, duplicated reads and removal of both blacklisted regions and duplicated reads.
Figure 9
Figure 9
(A,B) Cross-correlation profiles for reads in peaks and reads in DAC consensus blacklist for ChIP-seq and ChIP-exo ER ChIP (A) and for ChIP-exo FoxA1 ChIP (B).
Figure 10
Figure 10
Cross-correlation profile for FoxA1 ChIP-exo after no filtering, removal of DAC blacklisted regions and removal of duplicated reads.

Similar articles

Cited by

References

    1. Bailey T., Krajewski P., Ladunga I., Lefebvre C., Li Q., Liu T., et al. (2013). Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput. Biol. 9:e1003326 10.1371/journal.pcbi.1003326 - DOI - PMC - PubMed
    1. Bainbridge M. N., Wang M., Burgess D. L., Kovar C., Rodesch M. J., D'Ascenzo M., et al. (2010). Whole exome capture in solution with 3 Gbp of data. Genome Biol. 11:R62 10.1186/gb-2010-11-6-r62 - DOI - PMC - PubMed
    1. Barski A., Cuddapah S., Cui K., Roh T. Y., Schones D. E., Wang Z., et al. (2007). High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 10.1016/j.cell.2007.05.009 - DOI - PubMed
    1. Chen Y., Negre N., Li Q., Mieczkowska J. O., Slattery M., Liu T., et al. (2012). Systematic evaluation of factors influencing ChIP-seq fidelity. Nat. Methods 9, 609–614 10.1038/nmeth.1985 - DOI - PMC - PubMed
    1. Fujita P. A., Rhead B., Zweig A. S., Hinrichs A. S., Karolchik D., Cline M. S., et al. (2011). The UCSC genome browser database: update 2011. Nucleic Acids Res. 39, D876–D882 10.1093/nar/gkq963 - DOI - PMC - PubMed

LinkOut - more resources