Integrative annotation of chromatin elements from ENCODE data

doi:10.1093/nar/gks1284

. 2013 Jan;41(2):827-41.

doi: 10.1093/nar/gks1284. Epub 2012 Dec 5.

Integrative annotation of chromatin elements from ENCODE data

Michael M Hoffman¹, Jason Ernst, Steven P Wilder, Anshul Kundaje, Robert S Harris, Max Libbrecht, Belinda Giardine, Paul M Ellenbogen, Jeffrey A Bilmes, Ewan Birney, Ross C Hardison, Ian Dunham, Manolis Kellis, William Stafford Noble

Affiliations

PMID: 23221638
PMCID: PMC3553955
DOI: 10.1093/nar/gks1284

Integrative annotation of chromatin elements from ENCODE data

Michael M Hoffman et al. Nucleic Acids Res. 2013 Jan.

. 2013 Jan;41(2):827-41.

doi: 10.1093/nar/gks1284. Epub 2012 Dec 5.

Authors

Affiliation

¹ Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195-5065, USA.

PMID: 23221638
PMCID: PMC3553955
DOI: 10.1093/nar/gks1284

Abstract

The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.

PubMed Disclaimer

Figures

**Figure 1.**
Enrichment of various segment labels (vertically, labeled by green panels) from (A) Segway and (B) ChromHMM K562 segmentations over positions on an idealized p300 binding site, gene, CTCF binding site, and LaminB1 binding site. We calculated enrichment as the base-2 logarithm of the observed frequency of a label at a particular position along an annotation divided by the expected frequency of the label from its prevalence in the genome overall. Enriched positions are shown in red, and depleted positions are shown in blue. The labels for idealized gene components at the top include the mean length of that component in parentheses. (C) Heat map of parameters from Segway training for 14 GM12878 signal tracks against 25 segment labels. Color indicates the mean of a Gaussian according to the color bar on the right. (D) Heat map of parameters from ChromHMM concatenated training on 84 signal tracks from 6 ENCODE Tier 1–2 cell types. Color indicates the probability of a present mark, as a percentage, according to the color bar on the right.

**Figure 2.**
View of the *ENC1* locus on the minus strand using the ENCODE GM12878 segmentations. The unusual state pattern in middle of the gene in all three segmentations reveals a potential intronic regulatory element, which is confirmed by H3K4me1, H3K27ac, DNaseI hypersensitivity and transcription factor binding, and overlaps a putative GENCODE processed transcript.

**Figure 3.**
(A) Enrichment or depletion of GWAS SNPs (and several comparison SNP sets) in function-associated segments. The bars extend to the level of enrichment or depletion of each SNP set in the 25 segmentation classes from Segway (top) and ChromHMM (bottom) in GM12878. The results for 1000 random samplings of the SNPs matched to the phenotype-associated SNPs are displayed as a box plot, with the box extending from the 25th to the 75th percentiles, the whiskers extending to 1.5 times the interquartile range, and any outliers beyond shown as circles. If the enrichment for the phenotype associated, GWAS lead SNPs exceeded the 95th percentile of the results from the matched SNPs, then the bar is colored red (orange if otherwise). (B) An example of Crohn’s disease SNPs in non-coding sequences that could serve to regulate expression of *NOD2*. The figures show gene models from the GENCODE group (version 12), locations of SNPs associated with Crohn’s disease by GWAS, results of ChromHMM and Segway segmentations, selected histone modifications measured in GM12878 and HUVEC cells, locations of DNase hypersensitive sites in several cell types, and sites of occupancy by selected transcription factors. Regions discussed in the text are outlined by blue rectangles.

**Figure 4.**
Distribution of various classes of transcripts in the segmentations. Enrichment (red) or depletion (blue) of RNA-seq transcript categories (‘biotypes’) in each state for two 25-state segmentations: (A) Segway GM12878 and (B) ChromHMM GM12878. White cells indicate an absence of an RNA biotype in the corresponding state. Distribution of expression levels in segmentation states. The level of expression of each protein-coding RNA-seq contig intersecting a protein-coding gene in each state for (C) Segway GM12878 and (D) ChromHMM GM12878 was extracted from the data in Djebali *et al.* (29). The distribution of those values for all RNA contigs in the DNA segments for each state is shown as a box plot.

**Figure 5.**
(A) Average log₂ enrichment or depletion of four different conserved element sets—PhastCons (33), SiPhy-ω, SiPhy-π (16,35), and GERP (34)—for the 25 ChromHMM states averaged across all 6 cell types. (B) The same comparison for Segway states, but restricted to the K562 segmentation.

See this image and copyright information in PMC

Cited by

Deciphering the genetics and mechanisms of predisposition to multiple myeloma.
Went M, Duran-Lozano L, Halldorsson GH, Gunnell A, Ugidos-Damboriena N, Law P, Ekdahl L, Sud A, Thorleifsson G, Thodberg M, Olafsdottir T, Lamarca-Arrizabalaga A, Cafaro C, Niroula A, Ajore R, Lopez de Lapuente Portilla A, Ali Z, Pertesi M, Goldschmidt H, Stefansdottir L, Kristinsson SY, Stacey SN, Love TJ, Rognvaldsson S, Hajek R, Vodicka P, Pettersson-Kymmer U, Späth F, Schinke C, Van Rhee F, Sulem P, Ferkingstad E, Hjorleifsson Eldjarn G, Mellqvist UH, Jonsdottir I, Morgan G, Sonneveld P, Waage A, Weinhold N, Thomsen H, Försti A, Hansson M, Juul-Vangsted A, Thorsteinsdottir U, Hemminki K, Kaiser M, Rafnar T, Stefansson K, Houlston R, Nilsson B. Went M, et al. Nat Commun. 2024 Aug 5;15(1):6644. doi: 10.1038/s41467-024-50932-7. Nat Commun. 2024. PMID: 39103364 Free PMC article.
PAtCh-Cap: input strategy for improving analysis of ChIP-exo data sets and beyond.
Terooatea TW, Pozner A, Buck-Koehntop BA. Terooatea TW, et al. Nucleic Acids Res. 2016 Dec 1;44(21):e159. doi: 10.1093/nar/gkw741. Epub 2016 Aug 22. Nucleic Acids Res. 2016. PMID: 27550178 Free PMC article.
Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes.
Xiang G, He X, Giardine BM, Isaac KJ, Taylor DJ, McCoy RC, Jansen C, Keller CA, Wixom AQ, Cockburn A, Miller A, Qi Q, He Y, Li Y, Lichtenberg J, Heuston EF, Anderson SM, Luan J, Vermunt MW, Yue F, Sauria MEG, Schatz MC, Taylor J, Göttgens B, Hughes JR, Higgs DR, Weiss MJ, Cheng Y, Blobel GA, Bodine DM, Zhang Y, Li Q, Mahony S, Hardison RC. Xiang G, et al. Genome Res. 2024 Aug 20;34(7):1089-1105. doi: 10.1101/gr.277950.123. Genome Res. 2024. PMID: 38951027 Free PMC article.
Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions.
Ernst J, Melnikov A, Zhang X, Wang L, Rogov P, Mikkelsen TS, Kellis M. Ernst J, et al. Nat Biotechnol. 2016 Nov;34(11):1180-1190. doi: 10.1038/nbt.3678. Epub 2016 Oct 3. Nat Biotechnol. 2016. PMID: 27701403 Free PMC article.
High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.
Chandrananda D, Thorne NP, Bahlo M. Chandrananda D, et al. BMC Med Genomics. 2015 Jun 17;8:29. doi: 10.1186/s12920-015-0107-z. BMC Med Genomics. 2015. PMID: 26081108 Free PMC article.

See all "Cited by" articles

References

1. ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE) PLoS Biol. 2011;9:e1001046. - PMC - PubMed
1. Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. - PMC - PubMed
1. Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, et al. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 2009;19:2133–2143. - PMC - PubMed
1. Abeel T, Van de Peer Y, Saeys Y. Toward a gold standard for promoter prediction evaluation. Bioinformatics. 2009;25:i313–i320. - PMC - PubMed
1. Yip KY, Cheng C, Bhardwaj N, Brown JB, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M, et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 2012;13:R48. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

[1] ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE) PLoS Biol. 2011;9:e1001046. - PMC - PubMed

[2] ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE) PLoS Biol. 2011;9:e1001046. - PMC - PubMed

[3] Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. - PMC - PubMed

[4] Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–995. - PMC - PubMed

[5] Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, et al. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 2009;19:2133–2143. - PMC - PubMed

[6] Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, et al. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 2009;19:2133–2143. - PMC - PubMed

[7] Abeel T, Van de Peer Y, Saeys Y. Toward a gold standard for promoter prediction evaluation. Bioinformatics. 2009;25:i313–i320. - PMC - PubMed

[8] Abeel T, Van de Peer Y, Saeys Y. Toward a gold standard for promoter prediction evaluation. Bioinformatics. 2009;25:i313–i320. - PMC - PubMed

[9] Yip KY, Cheng C, Bhardwaj N, Brown JB, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M, et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 2012;13:R48. - PMC - PubMed

[10] Yip KY, Cheng C, Bhardwaj N, Brown JB, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M, et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 2012;13:R48. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Integrative annotation of chromatin elements from ENCODE data

Affiliation

Integrative annotation of chromatin elements from ENCODE data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases