Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 26;8(34):eabo6371.
doi: 10.1126/sciadv.abo6371. Epub 2022 Aug 26.

Cancer-driving mutations are enriched in genic regions intolerant to germline variation

Affiliations

Cancer-driving mutations are enriched in genic regions intolerant to germline variation

Dimitrios Vitsios et al. Sci Adv. .

Abstract

Large reference datasets of protein-coding variation in human populations have allowed us to determine which genes and genic subregions are intolerant to germline genetic variation. There is also a growing number of genes implicated in severe Mendelian diseases that overlap with genes implicated in cancer. We hypothesized that cancer-driving mutations might be enriched in genic subregions that are depleted of germline variation relative to somatic variation. We introduce a new metric, OncMTR (oncology missense tolerance ratio), which uses 125,748 exomes in the Genome Aggregation Database (gnomAD) to identify these genic subregions. We demonstrate that OncMTR can significantly predict driver mutations implicated in hematologic malignancies. Divergent OncMTR regions were enriched for cancer-relevant protein domains, and overlaying OncMTR scores on protein structures identified functionally important protein residues. Last, we performed a rare variant, gene-based collapsing analysis on an independent set of 394,694 exomes from the UK Biobank and find that OncMTR markedly improves genetic signals for hematologic malignancies.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Defining the OncMTR score.
(A) Bimodal distribution of median allelic balance values for heterozygous variants in the gnomAD database. This distribution refers to all gnomAD variants, consistent with gnomAD AB_median figures. We defined putative somatic variants as those with an AB_median ≤ 0.3 (dashed line). (B) The top figure demonstrates the MTR distribution of TP53 when considering all missense variants (blue) and when restricted to only germline variants (i.e., AB_median > 0.3; depicted in pink). We defined OncMTR as the difference between these two distributions (bottom). (C) OncMTR scores overlaid on the AlphaFold structure for TP53. The most intolerant region maps to the DNA binding domain of the protein, which is strongly enriched for mutations known to drive hematologic malignancies.
Fig. 2.
Fig. 2.. OncMTR regions are enriched for somatic variants associated with hematologic malignancies.
(A) Cross-entropy between the distribution MTR and MTRgermline distributions for COSMIC CGC genes, a random selection of genes, and the rest of the exome. (B) Receiver operator curve (ROC) depicting the ability of random forest models based on either the raw OncMTR score, the OncMTR transcript-level percentile scores (“Tx%”), and a joint model in discriminating between 546 unique leukemogenic variants and a random size-matched set of variants. (C) Mean ROC AUCs (with fivefold cross-validation) of logistic regression models based on raw OncMTR and other genome-wide scores in predicting variants involved in leukemia [same variant set as (B)]. The putatively neutral variant sets comprise a random, size- and transcript-matched selection of variants. OncMTR (mutrate) refers to the version of the OncMTR that considers the trimer mutability rates of each codon in its formulation. (D) The OncMTR distributions of driver mutations for hematologic malignancies versus solid tumors are derived from the Cancer Genome Interpreter.
Fig. 3.
Fig. 3.. OncMTR distributions for genes implicated in both cancer and Mendelian disease.
(A to C) OncMTR scores for GNB1 (A), NRAS (B), and DNMT3A (C) with corresponding protein structures from Protein Data Bank (PDB) (for NRAS, PDB ID: 6zio) or predicted by AlphaFold (13). Points on the OncMTR plots and spheres on the protein structures indicate pathogenic somatic mutations included in TopMED leukemogenic variant set. Red points indicate variants with OncMTR < −0.05. Points with a pink outline indicate somatic leukemogenic variants that are also known to cause developmental delay (DD) when mutated de novo in the germ line. De novo mutations were aggregated from the Online Mendelian Inheritance in Man database. GDP, guanosine diphosphate.
Fig. 4.
Fig. 4.. Overlap between OncMTR regions and protein domains.
(A) Pfam protein domains most strongly enriched with low OncMTR regions (OncMTR < −0.05). (B) Pfam clans most strongly enriched with low OncMTR regions. The DNA binding superfamily set was defined in a prior publication (44). (C) Proportions of genes enriched with low OncMTR scores in annotated protein domains in various cancer-related gene sets: genes carrying TopMED leukemogenic variants, annotated cancer hotspots, and the union of these three lists. (D to F) The most abundant Pfam domains enriched with low OncMTR regions in proteins encoded by the labeled sets of cancer genes. Error bars in each panel represent 95% confidence intervals (CIs). P values were calculated with Fisher’s exact test and adjusted via Bonferroni correction. Padj, adjusted P value.
Fig. 5.
Fig. 5.. Collapsing analyses using OncMTR.
(A) Effect sizes of gene-phenotype associations derived from a gene-level collapsing analysis performed on neoplasm phenotypes in 394,694 UKB exomes. “Flex” and “flexdmg” models include missense QVs with a gnomAD minor allele frequency (MAF) ≤ 0.1%. “Rare” and “raredmg” models include missense QVs with a gnomAD MAF ≤ 0.005%. “Flexdmg” and “raredmg” only consider missense variants with a REVEL score ≥ 0.5. Collapsing models are fully defined in table S12. (B) Changes in ORs observed for selected gene-phenotype associations. MDS, myelodysplastic syndrome.

Similar articles

Cited by

References

    1. Karczewski K. J., Francioli L. C., Tiao G., Cummings B. B., Alföldi J., Wang Q., Collins R. L., Laricchia K. M., Ganna A., Birnbaum D. P., Gauthier L. D., Brand H., Solomonson M., Watts N. A., Rhodes D., Singer-Berk M., England E. M., Seaby E. G., Kosmicki J. A., Walters R. K., Tashman K., Farjoun Y., Banks E., Poterba T., Wang A., Seed C., Whiffin N., Chong J. X., Samocha K. E., Pierce-Hoffman E., Zappala Z., O’Donnell-Luria A. H., Minikel E. V., Weisburd B., Lek M., Ware J. S., Vittal C., Armean I. M., Bergelson L., Cibulskis K., Connolly K. M., Covarrubias M., Donnelly S., Ferriera S., Gabriel S., Gentry J., Gupta N., Jeandet T., Kaplan D., Llanwarne C., Munshi R., Novod S., Petrillo N., Roazen D., Ruano-Rubio V., Saltzman A., Schleicher M., Soto J., Tibbetts K., Tolonen C., Wade G., Talkowski M. E.; Genome Aggregation Database Consortium, Aguilar Salinas C. A., Ahmad T., Albert C. M., Ardissino D., Atzmon G., Barnard J., Beaugerie L., Benjamin E. J., Boehnke M., Bonnycastle L. L., Bottinger E. P., Bowden D. W., Bown M. J., Chambers J. C., Chan J. C., Chasman D., Cho J., Chung M. K., Cohen B., Correa A., Dabelea D., Daly M. J., Darbar D., Duggirala R., Dupuis J., Ellinor P. T., Elosua R., Erdmann J., Esko T., Färkkilä M., Florez J., Franke A., Getz G., Glaser B., Glatt S. J., Goldstein D., Gonzalez C., Groop L., Haiman C., Hanis C., Harms M., Hiltunen M., Holi M. M., Hultman C. M., Kallela M., Kaprio J., Kathiresan S., Kim B. J., Kim Y. J., Kirov G., Kooner J., Koskinen S., Krumholz H. M., Kugathasan S., Kwak S. H., Laakso M., Lehtimäki T., Loos R. J. F., Lubitz S. A., Ma R. C. W., MacArthur D. G., Marrugat J., Mattila K. M., McCarroll S., McCarthy M. I., McGovern D., McPherson R., Meigs J. B., Melander O., Metspalu A., Neale B. M., Nilsson P. M., O’Donovan M. C., Ongur D., Orozco L., Owen M. J., Palmer C. N. A., Palotie A., Park K. S., Pato C., Pulver A. E., Rahman N., Remes A. M., Rioux J. D., Ripatti S., Roden D. M., Saleheen D., Salomaa V., Samani N. J., Scharf J., Schunkert H., Shoemaker M. B., Sklar P., Soininen H., Sokol H., Spector T., Sullivan P. F., Suvisaari J., Tai E. S., Teo Y. Y., Tiinamaija T., Tsuang M., Turner D., Tusie-Luna T., Vartiainen E., Vawter M. P., Ware J. S., Watkins H., Weersma R. K., Wessman M., Wilson J. G., Xavier R. J., Neale B. M., Daly M. J., MacArthur D. G., The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). - PMC - PubMed
    1. Wang Q., Dhindsa R. S., Carss K., Harper A. R., Nag A., Tachmazidou I., Vitsios D., Deevi S. V. V., Mackay A., Muthas D., Hühn M., Monkley S., Olsson H.; Astra Zeneca Genomics Initiative, Wasilewski S., Smith K. R., March R., Platt A., Haefliger C., Petrovski S., Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597, 527–532 (2021). - PMC - PubMed
    1. Dhindsa R. S., Copeland B. R., Mustoe A. M., Goldstein D. B., Natural selection shapes codon usage in the human genome. Am. J. Hum. Genet. 107, 83–95 (2020). - PMC - PubMed
    1. Petrovski S., Wang Q., Heinzen E. L., Allen A. S., Goldstein D. B., Genic Intolerance to functional variation and the interpretation of personal genomes. PLOS Genet. 9, e1003709 (2013). - PMC - PubMed
    1. Samocha K. E., Robinson E. B., Sanders S. J., Stevens C., Sabo A., McGrath L. M., Kosmicki J. A., Rehnström K., Mallick S., Kirby A., Wall D. P., MacArthur D. G., Gabriel S. B., DePristo M., Purcell S. M., Palotie A., Boerwinkle E., Buxbaum J. D., Cook E. H. Jr., Gibbs R. A., Schellenberg G. D., Sutcliffe J. S., Devlin B., Roeder K., Neale B. M., Daly M. J., A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014). - PMC - PubMed