Abstract
Somatic mutations in driver genes may ultimately lead to the development of cancer. Understanding how somatic mutations accumulate in cancer genomes and the underlying factors that generate somatic mutations is therefore crucial for developing novel therapeutic strategies. To understand the interplay between spatial genome organization and specific mutational processes, we studied 3,000 tumor–normal-pair whole-genome datasets from 42 different human cancer types. Our analyses reveal that the change in somatic mutational load in cancer genomes is co-localized with topologically-associating-domain boundaries. Domain boundaries constitute a better proxy to track mutational load change than replication timing measurements. We show that different mutational processes lead to distinct somatic mutation distributions where certain processes generate mutations in active domains, and others generate mutations in inactive domains. Overall, the interplay between three-dimensional genome organization and active mutational processes has a substantial influence on the large-scale mutation-rate variations observed in human cancers.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Code availability
Python scripts used in this manuscript are available at https://github.com/kcakdemir/MutationalDistribution/.
References
Perera et al. Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes. Nature 532, 259–263 (2016).
Sabarinathan, R., Mularoni, L., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 532, 264–267 (2016).
Makova, K. D. & Hardison, R. C. The effects of chromatin organization on variation in mutation rates in the genome. Nat. Rev. Genet. 16, 213–223 (2015).
Smith, K. S., Liu, L. L., Ganesan, S., Michor, F. & De, S. Nuclear topology modulates the mutational landscapes of cancer genomes. Nat. Struct. Mol. Biol. 24, 1000–1006 (2017).
Dekker, J. & Heard, E. Structural and functional diversity of topologically associating domains. FEBS Lett. 589, 2877–2884 (2015).
Dixon, J. R., Gorkin, D. U. & Ren, B. Chromatin domains: the unit of chromosome organization. Mol. Cell 62, 668–680 (2016).
Fudenberg, G. et al. Formation of chromosomal domains by loop extrusion. Cell Rep. 15, 2038–2049 (2016).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).
Pope, B. D. et al. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402–405 (2014).
Miura, H. et al. Single-cell DNA replication profiling identifies spatiotemporal developmental dynamics of chromosome organization. Nat. Genet. 51, 1356–1368 (2019).
Martincorena, I. & Campbell, P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).
Akdemir, K. C. et al. Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer. Nat. Genet. 52, 294–305 (2020).
Stamatoyannopoulos, J. A. et al. Human mutation rate associated with DNA replication timing. Nat. Genet. 41, 393–395 (2009).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Jégu, T., Aeby, E. & Lee, J. T. The X chromosome in space. Nat. Rev. Genet. 18, 377–389 (2017).
Darrow, E. M. et al. Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture. Proc. Natl Acad. Sci. USA 113, E4504–E4512 (2016).
Jäger, N. et al. Hypermutation of the inactive X chromosome is a frequent event in cancer. Cell 155, 567–581 (2013).
Kamarthapu, V. & Nudler, E. Rethinking transcription coupled DNA repair. Curr. Opin. Microbiol. 24, 15–20 (2015).
Alexandrov et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Hu, J., Adar, S., Selby, C. P., Lieb, J. D. & Sancar, A. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution. Genes Dev. 29, 948–960 (2015).
Haradhvala et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).
Petljak et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).
Mouradov, D. et al. Colorectal cancer cell lines are representative models of the main molecular subtypes of primary cancer. Cancer Res. 74, 3238–3247 (2014).
GTEx Consortium. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Lujan, S. A. et al. Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. Genome Res. 24, 1751–1764 (2014).
Frigola, J. et al. Reduced mutation rate in exons due to differential mismatch repair. Nat. Genet. 49, 1684–1692 (2017).
Chan, K. & Gordenin, D. A. Clusters of multiple mutations: incidence and molecular mechanisms. Annu. Rev. Genet. 49, 243–267 (2015).
Sakofsky, C. J. et al. Repair of multiple simultaneous double-strand breaks causes bursts of genome-wide clustered hypermutation. PLoS Biol. 17, e3000464 (2019).
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 https://doi.org/10.1038/s41586-019-1907-7 (2020).
Alexandrov, L. B. et al. Mutational signatures associated with tobacco smoking in human cancer. Science 354, 618–622 (2016).
Volkova, N. V. et al. Mutational signatures are jointly shaped by DNA damage and repair. Nat. Commun. 11, 2169 (2020).
Yarchoan, M., Hopkins, A. & Jaffee, E. M. Tumor mutational burden and response rate to PD-1 inhibition. N. Engl. J. Med. 377, 2500–2501 (2017).
Chaligné, R. et al. The inactive X chromosome is epigenetically unstable and transcriptionally labile in breast cancer. Genome Res. 25, 488–503 (2015).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Akdemir, K. C. & Chin, L. HiCPlotter integrates genomic data with interaction matrices. Genome Biol. 16, 198 (2015).
Flyamer, I. M. et al. Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition. Nature 544, 110–114 (2017).
Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Hutter, C. & Zenklusen, J. C. The cancer genome atlas: creating lasting value beyond its data. Cell 173, 283–285 (2018).
Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).
Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. DeconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).
Franco, I. et al. Whole-genome DNA sequencing provides an atlas of somatic mutagenesis in healthy human cells and identifies a tumor-prone cell type. Genome Biol. 20, 285 (2019).
Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).
Chan, K. et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47, 1067–1072 (2015).
Acknowledgements
We thank the participants and their families for contributing to this study. We thank Z. Coban-Akdemir, E. Keung, P. Polak, T. Gutschner, J. Posey and R. Jenq for their critical reading of this manuscript. We also thank our families, especially our children (special thanks to S. Akdemir from K.C.A.) for their support and understanding during the COVID-19 outbreak. This work was supported by a Cancer Prevention Research Institute of Texas award (R1205), the Welch Foundation’s Robert A. Welch Distinguished Chair Award (G-0040 to P.A.F.), the Emerson Collective Cancer Research Fund (to K.C.A.), SPORE in Brain Cancer CEP (P50CA127001 to K.C.A.), a US National Institutes of Health (NIH) Director’s Early Independence Award (DP5OD023071 to J.R.D.) and an NIH Intramural Research Program Project (Z1AES103266 to D.A.G.).
Author information
Authors and Affiliations
Contributions
K.C.A., J.R.D. and P.A.F. designed the study. K.C.A. and J.R.D. performed the computational analysis. V.T.L. and J.R.D. performed the Hi-C experiments on DLD1, CaCo2, LoVo and A2058 cell lines. All authors discussed the results and commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Chromatin domains reflect a similar domain-wide epigenetic profile across different cell types.
a, Overlaps between common TAD boundaries in this manuscript and TAD boundaries from ENCODE cancer cell lines. Horizontal bars represent the total number of TAD boundaries per cell type. Vertical bars represent the number of intersecting boundaries between cell types. Combination matrix (below), circles indicate that denote cell types are part of the intersection for each vertical bar. b, Aggregate plots show average cell-type specific enrichment levels for Insulation score, CTCF binding sites, DNAseI hypersensitivity regions and H3K9me3 ChIP-seq enrichment levels from GM12878 cell line around common TAD boundaries. c, Classification of subcompartments (domains) based on chromatin state coverage. Heatmap shows domain-length normalized coverage of each chromatin state from Roadmap Epigenome aggregate data (rows) for each domain (columns). Chromatin states are clustered based on co-enrichments within domains. Domains are classified into five groups: heterochromatin (purple), inactive (gray), repressed (blue), active (orange), and active-2 (red) according to dominant chromatin state combinations. d, Box plots represent the enrichment of selected ChromHMM chromatin states in each domain type. These states are selected based on their highest enrichment in the heatmap plotted in Extended Data Fig. 1c. In these and all other boxplots, the center line is the median; box limits are the upper and lower quantiles; whiskers represent 1.5x the interquartile range. Heterochromatin:180, Inactive:1219:, Repressed:969, Active:1086, Active-2:593. e, Heatmaps represent domain-length normalized coverage of cell-type specific ChromHMM chromatin states within domain for representative Roadmap Epigenome cell types. Cell type’s name and the Roadmap Epigenome IDs are denoted on top of each heatmap. Domain-state matrices are not clustered but sorted based on our five domain annotations. These annotations are denoted as colored bars beneath the heatmap. Box plots below represent the domain-length normalized coverage of chromatin states for the corresponding cell type. Domain numbers are Heterochromatin:180, Inactive:1219:, Repressed:969, Active:1086, Active-2:593.
Extended Data Fig. 2 Enrichment of domain-wide epigenetic profile across different cell types.
a, b, Heatmaps represent domain-length normalized coverage of cell-type specific ChromHMM chromatin states within domain for representative Roadmap Epigenome cell types. Cell type’s name and the Roadmap Epigenome IDs are denoted on top of each heatmap. Domain-state matrices are not clustered but sorted based on our five domain annotations. These annotations are denoted as colored bars beneath the heatmap. Box plots below represent the domain-length normalized coverage of chromatin states for the corresponding cell type. In these and all other boxplots, the center line is the median; box limits are the upper and lower quantiles; whiskers represent 1.5x the interquartile range. In these and following plots domain numbers are Heterochromatin:180, Inactive:1219:, Repressed:969, Active:1086, Active-2:593. c, Dot plots show the normalized log2 ATAC-Seq signal in each domain type. Median ATAC-Seq signal was calculated across the samples of each histology-type. Name of the cancer type is denoted on top of the plot. Error bar limits are 25th and 75th percentiles; the center line is the median. d, Dot plots show the average beta value from Illumina 450 K DNA methylation array in each domain type. Average beta values were calculated across the samples of each histology-type. Name of the cancer type is denoted on top of the plot. Error bar limits are 25th and 75th percentiles; the center line is the median.
Extended Data Fig. 3 The distribution of somatic mutations in cancer genomes is correlated with the three-dimensional genome organization.
a, Mutational load is correlated with spatial chromatin organization. Average profiles of somatic mutation accumulation in 3000 cancer samples (green) across 500Kb of TAD boundaries. Flanking domain types are denoted on top of each plot. b, Dot plots show the average replication timing patterns across different domain types. Domain numbers are Heterochromatin:180, Inactive:1219:, Repressed:969, Active:1086, Active-2:593. P-value was derived from Wilcoxon rank-sum test.
Extended Data Fig. 4 The distribution of somatic mutations in different cancer histology types.
a, Average profiles of somatic mutation accumulation in different cancer histology types across 500Kb of TAD boundaries delineating inactive and active domains. Y-axis represents average mutational load in 25 kb windows across all samples of a given histology type. Name of the cancer type is denoted on top of the plot. b, Heatmap represents the mutational load for every melanoma sample in the fourth and first quartile domains. Domains are ranked based on their mutation load in the whole melanoma cohort (summing mutations from all samples). On top, colored bars denote the domain annotations. Bar plots on the left show the number of domain annotations in the fourth and first quantiles.
Extended Data Fig. 5 Unique folding of inactive X chromosome shapes the distribution of somatic mutations.
a, Hi-C map of active (Xa) X-chromosome showing an active domain (R1) between inactive domains from a male human umbilical vein cell line (HUVEC, cell line from a male donor). Below, histograms show the distributions of clonal and subclonal somatic mutation accumulation (in 25 kb non-overlapping windows) per female (red) and male (blue) samples. Mutation numbers were normalized based on the number of samples. b, c, Histograms show the distributions of b) clonal and c) subclonal somatic mutation accumulation (in 25 kb non-overlapping windows) per female (red) and male (blue) samples along the entire X chromosome. d, e, Autosomal mutation accumulation patterns are not distinct between males and females. Plots show mutational load along chromosome 7 for the clonal (d) and subclonal (e) mutations in female (red) and male (blue) patients. f–j, Distribution of X chromosome mutation load compare to total mutation burden for males (observed and copy-number corrected values) and females in f) lower grade glioma (n = 18), g) glioblastoma (n = 40), h) pediatric brain tumors (n = 253), i) diffuse large b-cell lymphoma (n = 107), j) head and neck squamous carcinoma cohorts (n = 57). The center line is the median; box limits are the upper and lower quantiles; whiskers represent 1.5x the interquartile range. P-values were derived from Wilcoxon rank-sum test.
Extended Data Fig. 6 Distinct chromatin folding patterns and the distribution of somatic mutations.
a–c, Correlation between X chromosome copy number and X chromosome mutation burden in a) breast adenocarcinoma (n = 197), b) uterine endometrial carcinoma (n = 51) and c) ovarian adenocarcinoma (n = 113). P-values were derived from Student’s t-test. d, Correlation between expression of XIST with X chromosome mutation burden in breast adenocarcinoma (n = 197). P-value was derived from Student’s t-test. e, Dot plots show the normalized mutation load difference between melanoma and esophageal-adenocarcinoma cohort around the genomic regions identified as significant mutational load loci. Same number of regions are randomly selected across the genome and normalized mutation load difference was calculated for the same samples. In these and all other boxplots, the center line is the median; box limits are the upper and lower quantiles; whiskers represent 1.5x the interquartile range. P-value was derived from Wilcoxon rank-sum test. f, Dot plots represent log2 fold-change between a melanoma (A2058) and an esophageal-adenocarcinoma (OE33) cell line Hi-C data around the genomic regions identified as significant mutational load loci. Same number of regions are randomly selected across the genome and log2 Hi-C fold-change numbers were calculated for the A2058 and OE33 cell lines. P-value was derived from Wilcoxon rank-sum test. g, Mutational load difference SORCS1 gene between melanoma and esophageal-adenocarcinoma cohorts. Normalized mutation counts represented for melanoma (green) and esophageal-adenocarcinoma (purple) samples. Difference between the mutation load and the derivative of the difference is shown below. Highlighted regions mark significant mutation load change sites. Below heatmaps represent chromatin folding organization around SORCS1 gene in a A2058 and OE33 cell lines. Histograms represent the overall mutation load in melanoma and esophageal-adenocarcinoma cohorts. Box plots represent mutation load (melanoma:219; esophageal-adenocarcinoma:127) and normalized expression levels (melanoma:448; esophageal-adenocarcinoma:182) of the SORCS1 in the melanoma and esophageal-adenocarcinoma cohorts. P-values were derived from Wilcoxon rank-sum test.
Extended Data Fig. 7 Comparing mutational load patterns between cancer types can yield local chromatin conformation differences.
a–c, Mutational load difference around a) CDH11, b CPQ, c) PLOD2 genes between melanoma and esophageal-adenocarcinoma cohorts. Normalized mutation counts represented for melanoma (green) and esophageal-adenocarcinoma (purple) samples. Difference between the mutation load and the derivative of the difference is shown below. Highlighted regions mark significant mutation load change sites. Below heatmaps represent chromatin folding organization around a) CDH11, b) CPQ, c) PLOD2 genes in a melanoma (A2058) and esophageal-adenocarcinoma (OE33) cell lines. Histograms represent the overall mutation load in our melanoma and esophageal-adenocarcinoma cohorts. Box plots represent mutation load in individual samples (melanoma:219; esophageal-adenocarcinoma:127) and normalized expression levels (melanoma:448; esophageal-adenocarcinoma:182) of the a) CDH11, b) CPQ, c) PLOD2 genes in the melanoma and esophageal-adenocarcinoma cohorts. The center line is the median; box limits are the upper and lower quantiles; whiskers represent 1.5x the interquartile range. P-values were derived from Wilcoxon rank-sum test.
Extended Data Fig. 8 Patterns of mutational distributions by individual mutational signatures.
Dot plots demonstrate the association between the contribution of mutation signatures and mutation distribution in active and inactive domains (x-axis). Signatures exhibiting a preference in the active domains, namely a, SBS30.36, c, SBS10b.28, f, SBS16, g, SBS39, h, SBS40, and were colored in orange color. On the other hand, signatures exhibiting a preference toward the inactive domains, namely c, SBS10a, d, SBS17, e, SBS12 were colored in gray color. b, Dot plots show the NER binding in different domain types. Domain numbers: Heterochromatin:180, Inactive:1219:, Repressed:969, Active:1086, Active-2:593. P-value was derived from Wilcoxon rank-sum test.
Extended Data Fig. 9 Higher order chromatin organization and DNA mismatch repair activity.
a, Average profiles of mutation loads across boundaries delineating inactive to active domains MSS and MSI samples from colorectal adenocarcinoma, uterine adenocarcinoma, gastric adenocarcinoma, stomach adenocarcinoma, glioblastoma multiforme (CNS-GBM), pancreas adenocarcinoma and liver cancer; with dashed lines for MSI samples. b, Number of mutations in colorectal adenocarcinoma samples per megabase active versus inactive domains. MSI sample is denoted with cyan color. c, Schematic describes the TAD boundary strength calculation. Briefly, for each TAD boundary we obtained a matrix around 400Kb up- and down-stream of the boundary by centering the TAD boundary in the middle. Next we calculated the log2 ratio of sum of interactions occurring within TADs and between TADs for each boundary region. d, Average TAD boundary strength profiles across all TAD boundaries (n = 2477) in healthy colon tissue and colon cancer cell line Hi-C data. The center line is the median; box limits are the upper and lower quantiles; whiskers represent 1.5x the interquartile range.
Extended Data Fig. 10 APOBEC-mutagenesis affects the inactive X-chromosome mutation levels.
a, Dot plot demonstrates the association between the contribution of APOBEC-related mutation signatures (SBS2.13) and mutation distribution in active and inactive domains (x-axis). b, The distribution of APOBEC-signature enrichment subtypes (dark-green: A3A-like, gray: Not APOBEC-enriched) versus slope of mutations (active versus inactive domains). c, Scatter plot shows the preference of mutational signature toward TAD boundaries. Pearson correlation and the median slope calculations were described in the methods section. d, Distribution of X chromosome mutation load compared to total mutation burden for males (n:20) (observed and copy-number corrected values) and females (n:18) in lung adenocarcinoma patients. The center line is the median; box limits are the upper and lower quantiles; whiskers represent 1.5x the interquartile range. e, Dot plots show the enrichment of APOBEC signatures 2.13 and ratio of X-chromosome mutation load for female lung adenocarcinoma patients.
Supplementary information
Rights and permissions
About this article
Cite this article
Akdemir, K.C., Le, V.T., Kim, J.M. et al. Somatic mutation distributions in cancer genomes vary with three-dimensional chromatin structure. Nat Genet 52, 1178–1188 (2020). https://doi.org/10.1038/s41588-020-0708-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-0708-0
This article is cited by
-
Computational immunogenomic approaches to predict response to cancer immunotherapies
Nature Reviews Clinical Oncology (2024)
-
Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers
Nature Cancer (2024)
-
Tissue mosaicism following stem cell aging: blood as an exemplar
Nature Aging (2024)
-
Replication timing alterations are associated with mutation acquisition during breast and lung cancer evolution
Nature Communications (2024)
-
Redistribution of mutation risk in cancer
Nature Cancer (2024)