Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 27;9(1):9354.
doi: 10.1038/s41598-019-45839-z.

The ENCODE Blacklist: Identification of Problematic Regions of the Genome

Affiliations

The ENCODE Blacklist: Identification of Problematic Regions of the Genome

Haley M Amemiya et al. Sci Rep. .

Abstract

Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The removal of the ENCODE blacklist is an essential quality measure when analyzing functional genomics data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Blacklist regions are tightly distributed across the chromosome and sequester high read mapping signals. (a) Distribution of mapped reads along human chromosome 1 in hg19. (b) An example blacklisted region on chromosome 1. Displayed are pre-filtered ENCODE ChIP-seq peak calls, quantile normalized median read signal (Reads), and quantile normalized median multimapped read signal (Multi). Axes are scaled for illustrative purposes and signal values are truncated at approximately 10-fold enrichment. Signal in these regions are up to 6400× background levels. (c) An example “normal” ENCODE ChIP-seq peak region on chromosome 1 selected as a region containing ChIP-seq peaks.
Figure 2
Figure 2
Blacklist regions account for a significant portion of ChIP-seq reads, are driven by artifacts in genome assemblies, and removal of these regions is essential to removing noise in genomics assays. (a) The number of blacklisted regions across species with their average size, genomic coverage, and input datasets excluding assembly gaps used for hg38, mm10, dm6, and ce11 respectively. (b) An UpSet plot displaying the breakdown of uniquely annotated regions in hg19 and hg38, and the shared regions between them. Low-mappability (Low-Map.) regions account for the majority of unique regions in both hg19 and hg38. (c) Applying the blacklist to ChIP-seq peaks results in an overall reduced correlation and, in the highlighted example, results in a more biologically meaningful interpretation of the data.

Similar articles

Cited by

References

    1. ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in the human genome. Nature489, 57–74 (2012). - PMC - PubMed
    1. Carroll TS, Liang Z, Salama R, Stark R, de Santiago I. Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data. Front. Genet. 2014;5:75. doi: 10.3389/fgene.2014.00075. - DOI - PMC - PubMed
    1. Boyle AP, et al. Comparative analysis of regulatory information and circuits across distant species. Nature. 2014;512:453–456. doi: 10.1038/nature13668. - DOI - PMC - PubMed
    1. Yue F, et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature. 2014;515:355–364. doi: 10.1038/nature13992. - DOI - PMC - PubMed
    1. https://docs.google.com/spreadsheets/d/1G4SkqUMiGcUlvR6homc7RW33nSOf4mS9....

Publication types