Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb;578(7793):112-121.
doi: 10.1038/s41586-019-1913-9. Epub 2020 Feb 5.

Patterns of somatic structural variation in human cancer genomes

Collaborators, Affiliations

Patterns of somatic structural variation in human cancer genomes

Yilong Li et al. Nature. 2020 Feb.

Erratum in

  • Author Correction: Patterns of somatic structural variation in human cancer genomes.
    Li Y, Roberts ND, Wala JA, Shapira O, Schumacher SE, Kumar K, Khurana E, Waszak S, Korbel JO, Haber JE, Imielinski M; PCAWG Structural Variation Working Group; Weischenfeldt J, Beroukhim R, Campbell PJ; PCAWG Consortium. Li Y, et al. Nature. 2023 Feb;614(7948):E38. doi: 10.1038/s41586-022-05597-x. Nature. 2023. PMID: 36697835 Free PMC article. No abstract available.

Abstract

A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes1-7. Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types8. Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions-as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2-7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and-in liver cancer-frequently activate the telomerase gene TERT. A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act.

PubMed Disclaimer

Conflict of interest statement

R.B. owns equity in Ampressa Therapeutics; M.M. is the scientific advisory board chair of—and consultant for— OrigiMed, and receives research funding from Bayer and Ono Pharma, and patent royalties from LabCorp.; J.W. is a consultant for Nference Inc.; C.-Z.Z. is a cofounder and equity holder of Pillar Biosciences, a for-profit company specializing in the development of targeted sequencing assays.

Figures

Fig. 1
Fig. 1. Classification of structural variants in cancer genomes.
Schematics of major structural-variant (SV) classes, grouped according to whether they are simple or complex and arise through cut-and-paste or copy-and-paste processes. Each schematic comprises three parts. The top segment shows dotted arcs for each rearrangement junction that joins two chromosomal segments together. The middle segment shows the copy number of genomic segments that are involved. The bottom segment shows the configuration of the final derivative chromosome that results from the structural variant; the colour of the segments corresponds to the colour of that segment in the copy-number schematic. + indicates the different derivative chromosomes created for some of the classes: that is, the structural variants are not phased to a single derivative.
Fig. 2
Fig. 2. Frequency of structural-variant classes across tumour types.
a, Violin plots of density of classified structural-variant categories across patients within each histology group. Tumour type panels are sorted in descending order of the average number of structural-variant breakpoints per sample. Within each tumour type, the frequency distribution (y axis) of different structural-variant categories (x axis) across patients is shown as a density: regions of highest density have the greatest width of shaded area. In each panel, the number of patients is indicated at the top right. AdenoCA, adenocarcinoma; BNHL, B-cell non-Hodgkin lymphoma; ChRCC, chromophobe renal cell carcinoma; CLL, chronic lymphocytic leukaemia; CNS, central nervous system; GBM, glioblastoma; HCC, hepatocellular carcinoma; leiomyo, leiomyosarcoma; medullo, medulloblastoma; MPN, myeloproliferative neoplasm; eso, oesophageal; oligo, oligodendrocytic; panc, pancreatic; piloastro, pilocytic astrocytoma; prost, prostate; RCC, renal cell carcinoma; sarc, sarcoma; SCC, squamous cell carcinoma; TCC, transitional cell carcinoma; thy, thyroid. b, Per-sample counts of complex (bottom) and classified (top) structural-variant breakpoint junctions for oesophageal adenocarcinoma. c, Per-sample counts of complex (bottom) and classified (top) structural-variant breakpoint junctions for ovarian adenocarcinoma.
Fig. 3
Fig. 3. Chains, cycles and bridges of templated insertions.
ac, Examples of a typical cycle (a), chain (b) and bridge (c) of templated insertions. The estimated copy-number profile is shown as in Fig. 1, with structural variants shown as dotted arcs linking two copy-number segments. The derivative chromosome(s) that could explain the copy-number and structural-variant profile is shown below. d, e, Cycles of templated insertions that affect the TERT gene, in two hepatocellular carcinomas. KIAA1024 is also known as MINAR1.
Fig. 4
Fig. 4. Examples of clusters of 2–5 rearrangements seen in human cancers.
a, Structures created by two local rearrangements that cannot easily be explained by simple structural-variant classes (which we call local 2-jumps). The estimated copy-number profile is shown as in Fig. 1, with structural variants shown as dotted arcs linking two copy-number segments. Possible configurations of the derivative chromosome are shown below; multiple solutions are possible for each example. Dup, duplication; invDup, duplication linked by inverted rearrangement; trp, triplication. b, Structures created by 3–4 local rearrangements that cannot easily be explained by simple structural-variant categories. c, Structures created by one local rearrangement and one rearrangement that reaches elsewhere in the genome (local–distant clusters).
Fig. 5
Fig. 5. Size distribution and genomic properties of classified structural variants.
a, Size distribution of deletions per histology group, with tumour types ordered according to total number of events seen. Vertical dashed lines represent the two prominent modes. b, Size distribution of segments of templated insertion per histology group. For each tumour type, the three distributions for cycles, bridges and chains of templated insertions are superimposed. Ins, insertion. c, Associations between a subset of the genomic properties (rows) and classes of structural variant (columns). Each density curve represents the quantile distribution of the genomic property values at observed breakpoints compared to random genome positions. Asterisks indicate a significant departure from uniform quantiles after multiple hypothesis correction on a one-sided Kolmogorov–Smirnov test based on a sample size of 2,559 genomes containing structural variants: *false-discovery rate < 0.01, **false-discovery rate < 0.001, ***false-discovery rate < 10−6. Cells with significant property associations are shaded by the magnitude of the shift of the median observed quantile above (blue) or below (red) 0.5. The interpretation of each property from left to right is indicated by the axes to the right of the property label. Complex uncl, complex clusters unclassified; cplxy, chromoplexy; del, deletion; inv, inversion; ins, insertion; LAD, lamina-associated domain; recip, reciprocal; TAD, topologically associated domain; TD, tandem duplication; trans, translocation; unbal, unbalanced. d, Rearrangement counts as a function of bases of junction microhomology, fit to three linear functions consistent with different formation mechanisms. NHEJ, non-homologous end joining; MMEJ, microhomology-mediated end joining; SSA, single-strand annealing. e, Enrichment or depletion of breakpoint junctions between regions of the genome with particular annotations, compared with a permuted background that preserves breakpoint positions but swaps breakpoint partners. Centre points are the mean fold change over the permuted background; error bars represent three s.d. Analysis is based on a sample size of 2,559 genomes containing structural variants. LTR, long terminal repeat; SINE, short interspersed nuclear element; LINE, long interspersed nuclear element; heterochrom, heterochromatin.
Fig. 6
Fig. 6. Structural-variant signatures in human cancers.
a, The 12 most distinctive structural-variant signatures extracted by the Bayesian hierarchical Dirichlet process algorithm, run on a sample size of 2,559 genomes containing structural variants. Here the lengths of the bars represent the estimated proportion of each event class assigned to each signature (rows sum to one); the black line segments represent the 95% posterior interval for bar length from the Markov chain. FB, fold-back; mid, mid-sized. b, Association of pathogenic mutations (germline and somatic combined) in key DNA repair genes with structural-variant signatures. The sample size of patients who have pathogenic variants in the specific genes assessed is shown in brackets after each gene label (y axis). Hypothesis tests and effect sizes for each gene are derived from linear models for signature intensity after correction for histology. Significant associations from two-sided tests with correction for multiple hypothesis testing are shown. The colour and size of the points represent the estimated effect sizes. MSH refers to MSH2, MSH3, MSH4 and MSH6, genes in the mismatch repair pathway; FANC refers to genes associated with Fanconi anaemia, namely FANCA, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCI, FANCL and FANCM.
Extended Data Fig. 1
Extended Data Fig. 1. Per-sample counts of structural-variant breakpoint junctions by histology group.
Counts of simple, classified structural variants are shown above the x axis and counts of complex breakpoint junctions below the x axis. Patients within each tumour type are ranked by frequency of simple structural variants.
Extended Data Fig. 2
Extended Data Fig. 2. Further examples of templated insertion chains, cycles and bridges.
Schematics follow the same structure as in Fig. 3.
Extended Data Fig. 3
Extended Data Fig. 3. Number of breakpoint junctions in cycles, bridges and chains of templated insertions.
a, Histogram of numbers of breakpoint junctions in templated insertion cycles, chains and bridges across all samples in all tumour types in the cohort. b, c, Two examples of particularly long cycles of templated insertions in the cohort. Examples are depicted in a similar manner to those in Fig. 3.
Extended Data Fig. 4
Extended Data Fig. 4. Templated insertion events that activate TERT in hepatocellular carcinoma.
a, The positions of all structural-variant breakpoints in the TERT region in the PCAWG cohort (including 50-kb flanks either side of TERT), coloured by classification and vertically spaced by the distance to the next breakpoint in the cohort. If the two sides of a breakpoint junction are contained within the plotting window, they are joined by a curved line. The number of samples with a breakpoint in the plotting window is annotated in the table in the top left. bd, Examples of two cycles and a chain of templated insertions that affect TERT in hepatocellular carcinomas. e, Expression levels of TERT in patients with hepatocellular carcinoma (n = 187 patients), separated by whether TERT was wild type, had an activating promoter point mutation, structural variants in a templated insertion or other class. Individual patient data are shown as points. The box shows the median expression level as a thick black line, with the range of the box denoting the interquartile range. The whiskers show the range of data or 1.5× the interquartile range (whichever is lower).
Extended Data Fig. 5
Extended Data Fig. 5. Templated insertion events inactivating RB1 in breast and ovarian carcinomas.
a, The positions of all structural-variant breakpoints in the RB1 region in the PCAWG cohort (including 50-kb flanks either side of RB1), coloured by classification and vertically spaced by the distance to the next breakpoint in the cohort. If the two sides of a breakpoint junction are contained within the plotting window, they are joined by a curved line. The number of samples with a breakpoint in the plotting window is annotated in the table in the top left. be, Examples of three cycles and a bridge of templated insertions that affect RB1 in breast and ovarian carcinomas.
Extended Data Fig. 6
Extended Data Fig. 6. Size distribution of tandem duplications.
a, Size distribution of tandem duplications per histology group. b, Samples with more than 20 tandem duplications were grouped using hierarchical clustering according to the within-patient distribution of tandem-duplication size. Seven clusters emerged, with the size distribution of up to eight randomly chosen samples per cluster illustrated. The numbers in the top right of each panel denote the number of tandem duplications in that sample.
Extended Data Fig. 7
Extended Data Fig. 7. Size properties of clustered structural-variant classes.
a, Comparison of the minimum and maximum templated-insert size for multi-insert cycles, chains and bridges of templated insertions. b, All events with three or more templated inserts, grouped by combination of insert sizes. c, Correlations (Pearson’s correlation coefficient) and raw sizes of individual genomic segments for reciprocal inversions and local two-jumps. Each individual event is shown as a line that links the size of the individual segments in that event. The sample sizes for each event class are shown in the labels for each panel.
Extended Data Fig. 8
Extended Data Fig. 8. Relationship of an extended panel of genomic properties with structural-variant categories.
Associations between a subset of the genomic properties (rows) and classes of structural variant (columns). Each density curve represents the quantile distribution of the genomic property values at observed breakpoints, compared to random genome positions. Asterisks indicate significant departures from uniform quantiles after multiple hypothesis correction by the Benjamini–Yekutieli method on a one-sided Kolmogorov–Smirnov test, based on a sample size of 2,559 genomes containing structural variants: *false-discovery rate < 0.01, **false-discovery rate < 0.001, ***false-discovery rate < 10−6. Cells with significant property associations are shaded by the magnitude of the shift of the median observed quantile above (blue) or below (red) 0.5. The interpretation of each property from left to right is indicated by the axes to the right of the property label.
Extended Data Fig. 9
Extended Data Fig. 9. Properties of structural variants at chromosomal fragile sites.
a, Structural-variant breakpoints in the most affected fragile sites: FHIT, MACROD2 and WWOX. These are coloured by classification and vertically spaced by the distance to the next breakpoint in the cohort. If the two sides of a breakpoint junction are contained within the plotting window, they are joined by a curved line. The number of samples with a breakpoint in the plotting window is annotated in the tables at the top left. b, Number of deletions and tandem duplications (top) and number of affected samples (bottom) for the 18 fragile sites considered in this analysis. c, Size distribution of deletions and tandem duplications in fragile sites (FS) compared to the rest of the genome. d, Fragile-site preference for 20 cancer histology groups as indicated by the proportion of samples that contains a deletion in each of the 18 fragile sites considered here. The number of samples is indicated in parentheses.
Extended Data Fig. 10
Extended Data Fig. 10. Consistency of associations between signatures and mutations in DNA-repair genes.
a, Box-and-whisker plots showing the number of structural variants attributed to the small-deletion signature in different types of tumour, split by BRCA2 status (BRCA2 wild type in orange; BRCA2 mutant in cyan). The box denotes the interquartile range, with the median marked as a horizontal line. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is lower. Outlier patients are shown as points. There is an increase in events attributed to the small-deletion signature when BRCA2 is mutated, across multiple types of tumour (breast, pancreatic, ovarian, prostate, lung squamous and so on). b, Box-and-whisker plots as for a, showing the number of structural variants attributed to the small-deletion signature in different types of tumour, split by PALB2 status. c, Box-and-whisker plots as for a, showing the number of structural variants attributed to the early-replicating, small-tandem-duplication signature in different types of tumour, split by BRCA1 status. d, Box-and-whisker plots as for a, showing the number of structural variants attributed to the large-tandem-duplication signature in different types of tumour, split by CDK12 status.
Extended Data Fig. 11
Extended Data Fig. 11. Patterns of structural variants causing fusion genes and enhancer hijacking.
a, Rainfall plot of structural-variant breakpoints in the genes KIAA1549 and BRAF, commonly fused together through a tandem duplication in pilocytic astrocytomas. Structural variants are coloured by classification and arranged vertically by the distance to the next breakpoint in the cohort. If the two sides of a breakpoint junction are contained within the plotting window, they are joined by a curved line. The number of samples with a breakpoint in the plotting window is annotated in the table at the top of each panel. b, Rainfall plot of structural-variant breakpoints that affect RET, commonly fused to CCDC6 by inversion in papillary thyroid cancer. c, Rainfall plot of structural-variant breakpoints that affect BCL2, commonly hijacked to the IGH immunoglobulin locus by translocations in B cell lymphomas. d, Rainfall plot of structural-variant breakpoints that affect ERG, commonly fused with TMPRSS2 by deletion or more-complex events in prostate adenocarcinoma. e, Example of a TMPRSS2-ERG fusion gene in a prostate adenocarcinoma created by a chromoplexy cycle. The estimated copy-number profile is shown as black horizontal segments, with structural variants shown as dotted arcs linking the edges of two copy-number segments. f, Example of a TMPRSS2-ERG fusion gene in a prostate adenocarcinoma created by chromothripsis.
Extended Data Fig. 12
Extended Data Fig. 12. Patterns of structural variants that affect selected tumour-suppressor genes.
a, Rainfall plot of structural-variant breakpoints in the gene PTEN, commonly inactivated in breast and ovarian adenocarcinomas, in which tandem-duplication signatures are frequent. Structural variants are coloured by classification and arranged vertically by the distance to the next breakpoint in the cohort. If the two sides of a breakpoint junction are contained within the plotting window, they are joined by a curved line. The number of samples with a breakpoint in the plotting window is annotated in the table at the top of each panel. b, Rainfall plot of structural-variant breakpoints that affect RAD51B, commonly inactivated in breast and ovarian adenocarcinomas. c, Rainfall plot of structural-variant breakpoints that affect CDKN2A, commonly inactivated in tumours of the gastrointestinal tract, in which deletion signatures are common. d, Rainfall plot of structural-variant breakpoints that affect SMAD4, commonly inactivated in tumours of the gastrointestinal tract.
Extended Data Fig. 13
Extended Data Fig. 13. Examples of structural variants increasing the copy number of MYC.
The estimated copy-number profile is shown as black horizontal segments, with structural variants shown as dotted arcs linking the edges of two copy-number segments.

Comment in

Similar articles

Cited by

  • 53BP1 deficiency leads to hyperrecombination using break-induced replication (BIR).
    Shah SB, Li Y, Li S, Hu Q, Wu T, Shi Y, Nguyen T, Ive I, Shi L, Wang H, Wu X. Shah SB, et al. bioRxiv [Preprint]. 2024 Sep 13:2024.09.11.612483. doi: 10.1101/2024.09.11.612483. bioRxiv. 2024. PMID: 39314326 Free PMC article. Preprint.
  • Frequent CHD1 deletions in prostate cancers of African American men is associated with rapid disease progression.
    Diossy M, Tisza V, Li H, Sahgal P, Zhou J, Sztupinszki Z, Young D, Nousome D, Kuo C, Jiang J, Chen Y, Ebner R, Sesterhenn IA, Moncur JT, Chesnut GT, Petrovics G, Klus GT, Valcz G, Nuzzo PV, Ribli D, Börcsök J, Prosz A, Krzystanek M, Ried T, Szuts D, Rizwan K, Kaochar S, Pathania S, D'Andrea AD, Csabai I, Srivastava S, Freedman ML, Dobi A, Spisak S, Szallasi Z. Diossy M, et al. NPJ Precis Oncol. 2024 Sep 19;8(1):208. doi: 10.1038/s41698-024-00705-8. NPJ Precis Oncol. 2024. PMID: 39294262 Free PMC article.
  • GGTyper: genotyping complex structural variants using short-read sequencing data.
    Mirus T, Lohmayer R, Döhring C, Halldórsson BV, Kehr B. Mirus T, et al. Bioinformatics. 2024 Sep 1;40(Suppl 2):ii11-ii19. doi: 10.1093/bioinformatics/btae391. Bioinformatics. 2024. PMID: 39230689 Free PMC article.
  • Tracking clonal evolution of drug resistance in ovarian cancer patients by exploiting structural variants in cfDNA.
    Williams MJ, Vázquez-García I, Tam G, Wu M, Varice N, Havasov E, Shi H, Satas G, Lees HJ, Lee JJ, Myers MA, Zatzman M, Rusk N, Ali E, Shah RH, Berger MF, Mohibullah N, Lakhman Y, Chi DS, Abu-Rustum NR, Aghajanian C, McPherson A, Zamarin D, Loomis B, Weigelt B, Friedman CF, Shah SP. Williams MJ, et al. bioRxiv [Preprint]. 2024 Aug 23:2024.08.21.609031. doi: 10.1101/2024.08.21.609031. bioRxiv. 2024. PMID: 39229105 Free PMC article. Preprint.
  • The genomic landscape of 2,023 colorectal cancers.
    Cornish AJ, Gruber AJ, Kinnersley B, Chubb D, Frangou A, Caravagna G, Noyvert B, Lakatos E, Wood HM, Thorn S, Culliford R, Arnedo-Pac C, Househam J, Cross W, Sud A, Law P, Leathlobhair MN, Hawari A, Woolley C, Sherwood K, Feeley N, Gül G, Fernandez-Tajes J, Zapata L, Alexandrov LB, Murugaesu N, Sosinsky A, Mitchell J, Lopez-Bigas N, Quirke P, Church DN, Tomlinson IPM, Sottoriva A, Graham TA, Wedge DC, Houlston RS. Cornish AJ, et al. Nature. 2024 Sep;633(8028):127-136. doi: 10.1038/s41586-024-07747-9. Epub 2024 Aug 7. Nature. 2024. PMID: 39112709 Free PMC article.

References

    1. Bignell GR, et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 2007;17:1296–1303. doi: 10.1101/gr.6522707. - DOI - PMC - PubMed
    1. Campbell PJ, et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature. 2010;467:1109–1113. doi: 10.1038/nature09460. - DOI - PMC - PubMed
    1. Stephens PJ, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144:27–40. doi: 10.1016/j.cell.2010.11.055. - DOI - PMC - PubMed
    1. Lee JA, Carvalho CM, Lupski JR. A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell. 2007;131:1235–1247. doi: 10.1016/j.cell.2007.11.037. - DOI - PubMed
    1. Baca SC, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–677. doi: 10.1016/j.cell.2013.03.021. - DOI - PMC - PubMed

Publication types

LinkOut - more resources