Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 7;155(1):010901.
doi: 10.1063/5.0044150.

Multiscale modeling of genome organization with maximum entropy optimization

Affiliations

Multiscale modeling of genome organization with maximum entropy optimization

Xingcheng Lin et al. J Chem Phys. .

Abstract

Three-dimensional (3D) organization of the human genome plays an essential role in all DNA-templated processes, including gene transcription, gene regulation, and DNA replication. Computational modeling can be an effective way of building high-resolution genome structures and improving our understanding of these molecular processes. However, it faces significant challenges as the human genome consists of over 6 × 109 base pairs, a system size that exceeds the capacity of traditional modeling approaches. In this perspective, we review the progress that has been made in modeling the human genome. Coarse-grained models parameterized to reproduce experimental data via the maximum entropy optimization algorithm serve as effective means to study genome organization at various length scales. They have provided insight into the principles of whole-genome organization and enabled de novo predictions of chromosome structures from epigenetic modifications. Applications of these models at a near-atomistic resolution further revealed physicochemical interactions that drive the phase separation of disordered proteins and dictate chromatin stability in situ. We conclude with an outlook on the opportunities and challenges in studying chromosome dynamics.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Illustration of the many layers of three-dimensional genome organization. In eukaryotes, the double-stranded DNA first wraps around histone proteins (orange) to form nucleosomes. The N-terminal tails of histone proteins are subject to a wide range of post-translational modifications (PTM), including acetylation (Ac), methylation (Me), and ubiquitination (Ub). A string of nucleosomes, or chromatin, may compact into irregular structures and nucleosomal condensates, although regular structures have also been seen in vitro. At larger scales, genomic segments that are far apart in sequence can come in contact due to the formation of chromatin loops, transcriptional condensates, and topologically associating domains (TAD). TADs of similar properties may phase separate, resulting in the compartmentalization of chromosomes into regions enriched with heterochromatin (B compartment, blue) or euchromatin (A compartment, red). These two chromatin types differ in their compactness, gene density, and nuclear localization. Inside the nucleus, individual chromosomes often occupy non-overlapping regions to form territories.
FIG. 2.
FIG. 2.
Computational modeling of genome organization with Hi-C data. Top: illustration of the experimental protocol used in population Hi-C experiments (see the text for details). An example contact probability map for the genome from GM12878 cells is shown on the right, with the probability decreasing from yellow to red and to white. Bottom: illustration of the two popular methods used in building genome structures from Hi-C data. In consensus structure methods (left), pairwise Hi-C contact frequencies (pij) are first transformed to distances (dij) via a mapping function F. These distances can be used as constraints to refine computer models and derive consensus structures for the genome. An ensemble of structures can also be used (right) to reproduce Hi-C contact frequencies without converting them to distances. These structural ensemble methods often describe the structures with a probabilistic model and use iterative algorithms to update model parameters.
FIG. 3.
FIG. 3.
Illustration of the various mechanisms proposed for genome folding. (a) The metastable fractal globule forms in a process that drives the rapid collapse of a long polymer from an expanded, knotless configuration. The two ends do not have enough time to take part in the collapse, and the polymer remains knotless. (b) In the extrusion model, chromatin loops form as a result of the processive movement of Cohesin molecules along the DNA. CTCF molecules act as blockers to stop Cohesin extrusion, explaining the accumulation of the two proteins at loop boundaries. (c) Microphase separation of block copolymers can lead to contact patterns similar to the compartmentalization seen in Hi-C maps.
FIG. 4.
FIG. 4.
Data-driven mechanistic modeling of the whole-genome organization. (a) An example configuration of the diploid human genome colored from red to white and to blue with increasing chromosome ID. Each chromosome is modeled as a string of beads that can either be A (cyan) compartments, B (purple) compartments, or centromeres (green). (b) Example genome configurations colored by bead types (top), comparison between simulated and experimental chromosome radial positions (bottom), and comparison between simulated (upper triangle) and experimental (lower triangle) genome-wide contact maps for three genome models. In model 1, only one set of parameters was used to model intra- and inter-chromosomal interactions, while two sets of independent parameters were used in model 2. In model 3, in addition to the use of independent parameters for intra- and inter-chromosomal interactions, the centromeric regions were explicitly represented with a new type.
FIG. 5.
FIG. 5.
Predicting genome organization with a chromatin-state based polymer model. (a) Overview of the key elements of the computational model. The chromatin is modeled as a string of beads, each assigned with a chromatin state based on the corresponding combinatorial pattern of histone marks. Genomic regions bound by CTCF molecules are also identified to model CTCF mediated loop formation. The polymer model succeeds in quantitatively reproducing compartments (b), TADs (c), and chromatin loops (d) for chromosome 1 from GM12878 cells. (e) The polymer model is transferable across chromosomes and cell types as evidenced by the high correlation between simulated and experimental contact maps measured by Pearson correlation coefficients (PCC—left panel) and stratum-adjusted correlation coefficient (SCC—right panel).
FIG. 6.
FIG. 6.
Illustration of the two structural models proposed for chromatin fiber. A total of 24 nucleosomes are shown in panel (a) and 12 nucleosomes in panel (b). Histone proteins from the odd and even nucleosomes are shown in blue and green, respectively. The DNA molecule is indicated in gold.
FIG. 7.
FIG. 7.
Illustration of the three different types of mesoscopic chromatin models that differ in the representation and energetic contributions. (a) In geometric models, the energetics of the chromatin fiber is fully specified by two angles that correspond to the DNA entry–exit angle of individual nucleosomes (α) and the relative rotational angle between two connecting nucleosomes (β). (b) Particle-based models allow for more accurate treatment of the flexibility of linker DNA and histone tails and histone H1. Inter-nucleosome interactions can be introduced to account for contributions from globular domains of histone proteins. (c) Models with DNA molecules at a single base-pair resolution have been introduced to characterize the bending and twisting of linker DNA with greater details.
FIG. 8.
FIG. 8.
Thermodynamics of nucleosome unwinding. The free energy (FE) profile (white dots) as a function of the DNA end-to-end distance supports a three-stage scenario for DNA unwinding. The first stage (blue) corresponds to the unwinding of the outer layer. In the second stage (orange), no significant DNA unwinding occurs, but free energy rises sharply. Finally, the inner layer begins to unwind at a modest free energy cost in the third stage (green). Example nucleosome configurations at different stages are provided on the side, with the DNA indicated in gold and histone proteins indicated in orange, red, blue, and green. The free energy barrier in the transition region is mostly dominated by energetic contributions (PE, green line), which are compensated by an increase in entropy (−TS, red line) from the freed histone tails.
FIG. 9.
FIG. 9.
Stability and folding pathways of the tetra-nucleosome. (a) Illustration of the neural network approach for parameterizing high-dimensional free energy surfaces from mean forces. The neural network takes the six internucleosomal distances (S1, S2, …, Sn) as an input to compute the corresponding free energy [A(S)] and mean forces (ASα). (b) Projection of the six-dimensional free energy profile to the distance between 1 and 3 (d13) and 2 and 4 (d24) nucleosomes. The sequential (pink) and concerted (yellow) pathway for tetra-nucleosome folding are shown on top of the free energy profile with energy unit kcal/mol. (c) Example tetra-nucleosome configurations along the two folding pathways. The DNA molecule is shown in gold, and the histone octamers are shown in green, white, blue, and red.
FIG. 10.
FIG. 10.
Coarse-grained protein force field, MOFF, enables large scale simulation of phase separation. (a) Illustration of the maximum entropy optimization algorithm for protein force field parameterization. The algorithm reparameterizes biasing energies (αf) determined from maximum entropy optimization with a weighted linear combination of contacts (ɛC). When solving the reparameterization algorithm with least squares regression, additional constraints can be included for globular proteins to ensure that the native conformations have the lowest energy (step 2b in yellow). (b) Example configurations of HP1 molecules in the dilute and condensed phase.

Similar articles

Cited by

References

    1. International Human Genome Sequencing, “Initial sequencing and analysis of the human genome,” Nature 409, 860–921 (2001).10.1038/35057062 - DOI - PubMed
    1. Liu X., Milshina N., Glasser K., Nelson K., Hannenhalli S., Chaturvedi K., Wolfe K., Gabor Miklos G. L., Carnes-Stine J., Turner R., Rodriguez R., Lewis M., Rowe W., Lu F., Caminha M., Kalush F., Brandon R., Zhang Q., Lei Y., Glodek A., Bafna V., Busam D., Thomas P. D., Vech C., Flanigan M., Peterson M., Wang A., Gluecksmann A., Sanders R., Kraft C., Wides R., Roberts R. J., Zhong W., Ye J., Gilbert D., Wang G., Mobarry C., Pratts E., Zhu X., Curry L., Fosler C., McIntosh T., Gire H., Neelam B., Spier G., Dahlke C., Zhang H., Sutton G. G., Venter J. C., Subramanian G., Stewart E., An H., Istrail S., Nguyen N., Ketchum K. A., Wu D., Sitter C., Kline L., Zhan M., Jordan C., Lippert R., Esparham S., Zhang J., Charlab R., Hart B., Abu-Threideh J., Gorokhov M., Evangelista C., Allen D., Xiao C., Scott R., Ma D., Muruganujan A., Kejariwal A., Zhong F., Tint N. N., Mural R. J., Hladun S., Garg N., Amanatides P., Ji R.-R., Ke Z., Kasha J., Adams M. D., Guan P., Pan S., Gu Z., Donnelly M., Lai Z., Beasley E., Suh E., Zheng X. H., Baldwin D., Heiman T. J., Wei M.-H., Peck J., Venter E., Yan C., Jordan J., Naik A. K., Hoover J., Nodell M., Guo N., Wetter J., Qureshi H., Awe A., Evans C. A., Sprague A., Simpson M., Howland T., Mays A. D., Nusskern D., Rusch D. B., Ge W., Francesco V. D., Levine A. J., Zhu S. C., Gocayne J. D., Yandell M., Basu A., McKusick V. A., Schwartz R., Remington K., Liang Y., Smith H. O., Rogers Y.-H., Wang X., Zinder N., Carter C., Sjolander K. V., Moy L., Majoros W., Moore H. M., Thomas R., Merkulov G. V., Baumhueter S., Salzberg S., Johnson J., Bonazzi V., Ballew R. M., Jennings D., Smith T., Wang Z. Y., Heil J., Delcher A., Myers E. W., Moy M., Narayan V. A., Dew I., Gan W., Higgins M. E., Wang J., Strong R., Baden H., Desilets R., Holt R. A., Hatton T., Stockwell T., Houck J., Gong F., Puri V., Kravitz S., Dodson K., Mann F., Karlak B., Koduru S., Shao W., Tse S., Lopez J., Chen L., Wen M., Clark A. G., Bolanos R., Biddick K., Gabrielian A. E., Nguyen T., Shue B., Eilbeck K., Yooseph S., Doup L., Pfannkoch C., Zhao Q., Beeson K., Zhao S., Halpern A., Fasulo D., Chandramouliswaran I., Davenport L., Cravchik A., Sato S., Heiner C., McCawley S., Danaher S., Deng Z., Windsor S., Ali F., May D., Zaveri K., Cheng M. L., Simon M., Carver A., Baxendale J., Broder S., Huson D. H., Hostin D., Lin X., Guigó R., Romblad D., Levy S., Hunkapiller M., Ibegwam C., Yao A., Haynes C., Ely D., Wang M., Nelson C., Chiang Y.-H., Nadeau J., Zheng L., Reardon M., Levitsky A., Harris M., Williams M., Ferriera S., Ruhfel B., Li P. W., Dunn P., Li J., Slayman C., Murphy B., Caulk P., Graham K., Wu M., Glanowski S., Florea L., Coyne M., Love A., Murphy S., Li Z., Lazareva B., Zaveri J., Xia A., Newman M., Wortman J. R., McDaniel J., Woodage T., McMullen I., Kagan L., Haynes J., Sun J., Center A., Campbell M. J., Smallwood M., Blick L., Diemer K., Henderson S., Kodira C. D., Winn-Deen E., Zandieh A., Zhang W., Walenz B., Gropman B., Barnstead M., Reinert K., Williams S., Mi H., Barrow I., Cargill M., Abril J. F., Narechania A., Dombroski M., Scott J., Dietz S., and Skupski M., “The sequence of the human genome,” Science 291, 1304–1351 (2002).10.1126/science.1058040 - DOI - PubMed
    1. Schrödinger E., What is Life?, Canto Classics (Cambridge University Press, 2014).
    1. Roadmap Epigenomics Consortium, Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M. J., Amin V., Whitaker J. W., Schultz M. D., Ward L. D., Sarkar A., Quon G., Sandstrom R. S., Eaton M. L., Wu Y.-C., Pfenning A. R., Wang X., Claussnitzer M., Liu Y., Coarfa C., Harris R. A., Shoresh N., Epstein C. B., Gjoneska E., Leung D., Xie W., Hawkins R. D., Lister R., Hong C., Gascard P., Mungall A. J., Moore R., Chuah E., Tam A., Canfield T. K., Hansen R. S., Kaul R., Sabo P. J., Bansal M. S., Carles A., Dixon J. R., Farh K.-H., Feizi S., Karlic R., Kim A.-R., Kulkarni A., Li D., Lowdon R., Elliott G., Mercer T. R., Neph S. J., Onuchic V., Polak P., Rajagopal N., Ray P., Sallari R. C., Siebenthall K. T., Sinnott-Armstrong N. A., Stevens M., Thurman R. E., Wu J., Zhang B., Zhou X., Beaudet A. E., Boyer L. A., De Jager P. L., Farnham P. J., Fisher S. J., Haussler D., Jones S. J. M., Li W., Marra M. A., McManus M. T., Sunyaev S., Thomson J. A., Tlsty T. D., Tsai L.-H., Wang W., Waterland R. A., Zhang M. Q., Chadwick L. H., Bernstein B. E., Costello J. F., Ecker J. R., Hirst M., Meissner A., Milosavljevic A., Ren B., Stamatoyannopoulos J. A., Wang T., and Kellis M., “Integrative analysis of 111 reference human epigenomes,” Nature 518, 317–329 (2015).10.1038/nature14248 - DOI - PMC - PubMed
    1. Misteli T., “Beyond the sequence: Cellular organization of genome function,” Cell 128, 787–800 (2007).10.1016/j.cell.2007.01.028 - DOI - PubMed