Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Nov;15(11):709-21.
doi: 10.1038/nrg3788. Epub 2014 Sep 16.

Identifying and mitigating bias in next-generation sequencing methods for chromatin biology

Affiliations
Review

Identifying and mitigating bias in next-generation sequencing methods for chromatin biology

Clifford A Meyer et al. Nat Rev Genet. 2014 Nov.

Abstract

Next-generation sequencing (NGS) technologies have been used in diverse ways to investigate various aspects of chromatin biology by identifying genomic loci that are bound by transcription factors, occupied by nucleosomes or accessible to nuclease cleavage, or loci that physically interact with remote genomic loci. However, reaching sound biological conclusions from such NGS enrichment profiles requires many potential biases to be taken into account. In this Review, we discuss common ways in which biases may be introduced into NGS chromatin profiling data, approaches to diagnose these biases and analytical techniques to mitigate their effect.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Overview of ChIP-seq, DNase-seq, ATAC-seq and MNase-seq experiments
A genomic locus analyzed by complementary chromatin profiling experiments reveals different facets of chromatin structure; ChIP-seq reveals binding sites of specific transcription factors, DNase-seq and ATAC-seq reveal regions of open chromatin while MNase-seq identifies well-positioned nucleosomes. In ChIP-seq chromatin immunoprecipitation (ChIP) is used to extract DNA fragments that are bound to the target protein, either directly or via other proteins in a complex containing the target factor. In DNase-seq, chromatin is lightly digested by the DNase I endonuclease. Size selection is used to enrich for fragments that are produced in regions of chromatin where the DNA is highly sensitive to DNase I attack. ATAC-seq is an alternative to DNase-seq that uses an engineered Tn5 transposase to cleave DNA and to integrate primer DNA sequences into the cleaved genomic DNA. Micrococcal nuclease (MNase) is an endo-exo- nuclease that processively digests DNA until an obstruction such as a nucleosome is reached.
Figure 2
Figure 2. Fragmentation Effects in DNase-seq and ChIP-seq
Chromatin structure, fragmentation, and enrichment, interact to produce biased patterns of enrichment across the genome. (a) Some transcription factors, such as CTCF, typically bind in short nucleosome-depleted regions that are flanked by arrays of nucleosomes. When carrying out DNase-seq short fragments are far more efficient than longer ones for identifying such sites. (b) Histones and other factors that associate with nucleosomes rather than linker regions may also be located in DNase I hypersensitive regions. Longer fragments may be more efficient for detecting the binding of such factors. (c) Some factors bind in linker regions that are flanked by loosely unorganized nucleosomes. Such regions can be enriched in both long and short fragments in DNase-seq. (d) In ChIP-seq chromatin is typically fragmented by sonication. Like DNase-seq sonication is more efficient in regions of open chromatin. Factors bound in open chromatin contexts are more likely to be identified by ChIP-seq.
Figure 3
Figure 3. Variability of H3K4me3 ChIP-seq in human embryonic stem (ES) and differentiated cell lines
Several factors including fragmentation, immunoprecipitation conditions, and PCR biases can lead to different patterns of H3K4me3 enrichment at gene promoters in the same cell line. Coarse characteristics of H3K4me3 enrichment are consistent between samples, for example the depletion of H3K4me3 immediately upstream of the transcription start site of a core set of genes. Closer inspection reveals clear qualitative and quantitative differences between samples. For example, some samples show sharper peaks, perhaps due to differences in MNase digestion conditions and fragment selection. Regions that appear to be different between ES cells and differentiated cells in ChIP-seq samples produced by laboratory B also show variability in ES cell ChIP-seq replicates produced by laboratory A. These differences cannot be eliminated simply by scaling read counts to account for differences in read depth as the effects are not uniform across all genes. Quantitative comparisons of ChIP-seq signal are problematic unless biological replicates are done and protocols are carried out in a highly consistent way to produce data with comparable characteristics. Modeling bias can help reduce the amount of unexplained variability and increase sensitivity in detecting true differences between sample groups.

Similar articles

Cited by

References

    1. Barski A, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–37. - PubMed
    1. Johnson D, Mortazavi A, Myers R, Wold B. Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Science. 2007;(80):1497–1502. - PubMed
    1. Mikkelsen TS, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448:553–60. - PMC - PubMed
    1. Kharchenko PV, Tolstorukov MY, Park PJ. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol. 2008;26:1351–9. - PMC - PubMed
    1. Schones DE, et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008;132:887–98. - PMC - PubMed

Publication types