Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 27;16(1):110.
doi: 10.1186/s13059-015-0661-x.

Integrative modeling reveals the principles of multi-scale chromatin boundary formation in human nuclear organization

Affiliations

Integrative modeling reveals the principles of multi-scale chromatin boundary formation in human nuclear organization

Benjamin L Moore et al. Genome Biol. .

Abstract

Background: Interphase chromosomes adopt a hierarchical structure, and recent data have characterized their chromatin organization at very different scales, from sub-genic regions associated with DNA-binding proteins at the order of tens or hundreds of bases, through larger regions with active or repressed chromatin states, up to multi-megabase-scale domains associated with nuclear positioning, replication timing and other qualities. However, we have lacked detailed, quantitative models to understand the interactions between these different strata.

Results: Here we collate large collections of matched locus-level chromatin features and Hi-C interaction data, representing higher-order organization, across three human cell types. We use quantitative modeling approaches to assess whether locus-level features are sufficient to explain higher-order structure, and identify the most influential underlying features. We identify structurally variable domains between cell types and examine the underlying features to discover a general association with cell-type-specific enhancer activity. We also identify the most prominent features marking the boundaries of two types of higher-order domains at different scales: topologically associating domains and nuclear compartments. We find parallel enrichments of particular chromatin features for both types, including features associated with active promoters and the architectural proteins CTCF and YY1.

Conclusions: We show that integrative modeling of large chromatin dataset collections using random forests can generate useful insights into chromosome structure. The models produced recapitulate known biological features of the cell types involved, allow exploration of the antecedents of higher-order structures and generate testable hypotheses for further experimental studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Concordance of chromatin structure at multiple scales over three human cell types. The eigenvector compartment profile is shown for chromosome 2 for three human cell types (left). Genome-wide Pearson correlation coefficients between eigenvectors at 1-Mb resolution are in the range 0.75 to 0.80 (Additional file 1: Figure S2B). At higher resolution, the zoomed region illustrates conservation of topological domains (TADs) over 20 Mb of the same chromosome. Genome-wide, 33% of all H1 TAD boundaries have a matching boundary in GM12878 in the same or an adjacent 40-kb bin (K562: 31%, null model: 18%; Kolmogorov–Smirnov test: D=0.26, p≈0). Similarly, for H1 compartment boundaries, 37% have a matching boundary in the same or an adjacent 100-kb bin in GM12878 (K562: 35%, null model: 7%; Kolmogorov–Smirnov test: D=0.47, p≈0, Additional file 1: Figure S2A). Mb, megabases; TAD, topological domain.
Figure 2
Figure 2
Accurate models of higher-order chromatin state built from locus-level features.(A) Model predictions (Predicted eig) are compared to observed values (Empirical eig). Various metrics are used to measure the accuracy of regression modeling – the Pearson correlation coefficient (PCC) and root mean-squared error (RMSE) – and to evaluate classification accuracy (for A e i g≥0 and B e i g<0) – accuracy (percentage of true positives, Acc.) and area under the receiver operating characteristic curve (AUROC). (B) Variable importance (shown for the ten most informative features per model) is calculated as the decrease in the accuracy of predictions for the permuted variable relative to observed (in units of percentage increase in MSE), averaged over the forest (see Materials and methods). Acc., accuracy; AUROC, area under the receiver operating characteristic curve; eig, eigenvector; MSE, mean-squared error; PCC, Pearson correlation coefficient; RMSE, root mean-squared error.
Figure 3
Figure 3
Models trained in one cell type can generalize to others. Each model, trained in one cell type, was applied to the chromatin feature datasets from the other two cell types. (A) The GM12878 model achieved high accuracy when applied to K562 features (PCC =0.76), as did the reciprocal cross (PCC =0.75). (B) In each case, predictive accuracy decreased on cross-application but there remains significant agreement between predicted and empirical values. Acc., accuracy; AUROC, area under the receiver operating characteristic curve; PCC, Pearson correlation coefficient; RMSE, root mean-squared error.
Figure 4
Figure 4
Regions with variable structure are less successfully modeled and are associated with cell-type-specific enhancer activity.(A) Model accuracy is significantly different between low- and high-variability regions, defined as, respectively, the lowest and highest thirds of the distribution of median absolute deviations between cell types. (B) Regions occupying altered compartments are defined as those assigned to A in one cell type but to B in the other two cell types, or vice versa. The numbers of enhancers (cell type specific or shared between two or more cell types) are shown for regions with altered (open or closed) and non-altered (none) compartments in each cell type. PCC, Pearson correlation coefficient.
Figure 5
Figure 5
Structurally variable regions indicate cell-type-specific biology. Regions occupying the active A nuclear compartment in one cell type, but the repressed B compartment in the other two, were selected and ranked by the number of predicted active enhancers (Figure 4). (A) The region chr5:158–159 Mb, which occupies the open A compartment in GM12878 cells, is shown as an example (top five regions for each cell type are shown in Additional file 1: Figure S10). Displayed tracks are: known genes (UCSC), compartment eigenvectors, chromHMM/Segway combined chromatin state predictions, open chromatin FAIRE peaks, and H3K27ac signal. (B) Structurally variable regions show a greater than expected proportion of contacts with other active A compartments, in the cell type in which they are active relative to those same regions in the other two cell types. Box plot notches represent 95 % confidence intervals of the median. Each variable region is also shown individually in Additional file 1: Figure S11. TSS, transcription start site.
Figure 6
Figure 6
Chromatin features underlying TAD and compartment boundaries.(A) Selected profiles for locus-level features are shown for TAD boundaries (CTCF, H3K9me3 and POL2) and compartment boundaries (H2A.Z, H3K4me2 and YY1), as a mean normalized ChIP-seq signal relative to input chromatin per bin (±1 standard error). TAD boundaries were examined over 40-kb bins over the 1 Mb flanking each boundary; compartment boundaries were examined over 100-kb bins over 3 Mb. (B) The significance of enrichment or depletion (− log10P two-tailed Mann–Whitney test) of a feature was calculated as the boundary bin relative to the ten most peripheral bins (five either side). Points are scaled by the absolute mean difference in signal over the boundary relative to the mean of peripheral bins. ChIP-seq, chromatin immunoprecipitation sequencing; TAD, topological domain.

Similar articles

Cited by

References

    1. Bickmore Wa, van Steensel B. Genome architecture: domain organization of interphase chromosomes. Cell. 2013;152:1270–84. doi: 10.1016/j.cell.2013.02.001. - DOI - PubMed
    1. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9:215–16. doi: 10.1038/nmeth.1906. - DOI - PMC - PubMed
    1. Ram O, Goren A, Amit I, Shoresh N, Yosef N, Ernst J, et al. Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells. Cell. 2011;147:1628–39. doi: 10.1016/j.cell.2011.09.057. - DOI - PMC - PubMed
    1. ENCODE An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. - DOI - PMC - PubMed
    1. Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 2013;41:827–41. doi: 10.1093/nar/gks1284. - DOI - PMC - PubMed

Publication types

LinkOut - more resources