Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov;599(7883):91-95.
doi: 10.1038/s41586-021-04043-8. Epub 2021 Oct 27.

Disease variant prediction with deep generative models of evolutionary data

Affiliations

Disease variant prediction with deep generative models of evolutionary data

Jonathan Frazer et al. Nature. 2021 Nov.

Erratum in

Abstract

Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences1-3. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods4-10 have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable11. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification12-16. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.

PubMed Disclaimer

Comment in

Similar articles

Cited by

  • Ensembl 2023.
    Martin FJ, Amode MR, Aneja A, Austine-Orimoloye O, Azov AG, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Bignell A, Boddu S, Branco Lins PR, Brooks L, Ramaraju SB, Charkhchi M, Cockburn A, Da Rin Fiorretto L, Davidson C, Dodiya K, Donaldson S, El Houdaigui B, El Naboulsi T, Fatima R, Giron CG, Genez T, Ghattaoraya GS, Martinez JG, Guijarro C, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Marques-Coelho D, Marugán JC, Merino GA, Mirabueno LP, Mushtaq A, Hossain SN, Ogeh DN, Sakthivel MP, Parker A, Perry M, Piližota I, Prosovetskaia I, Pérez-Silva JG, Salam AIA, Saraiva-Agostinho N, Schuilenburg H, Sheppard D, Sinha S, Sipos B, Stark W, Steed E, Sukumaran R, Sumathipala D, Suner MM, Surapaneni L, Sutinen K, Szpak M, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Walts B, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley GR, Loveland JE, Moore B, Mudge JM, Tate J, Thybert D, Trevanion SJ, Winterbottom A, Frankish A, Hunt SE, Ruffier M, Cunningham F, Dyer S, Finn RD, Howe KL, Harrison PW, Yates AD, Flicek P. Martin FJ, et al. Nucleic Acids Res. 2023 Jan 6;51(D1):D933-D941. doi: 10.1093/nar/gkac958. Nucleic Acids Res. 2023. PMID: 36318249 Free PMC article.
  • A novel assessment of whole-mount Gleason grading in prostate cancer to identify candidates for radical prostatectomy: a machine learning-based multiomics study.
    Ning J, Spielvogel CP, Haberl D, Trachtova K, Stoiber S, Rasul S, Bystry V, Wasinger G, Baltzer P, Gurnhofer E, Timelthaler G, Schlederer M, Papp L, Schachner H, Helbich T, Hartenbach M, Grubmüller B, Shariat SF, Hacker M, Haug A, Kenner L. Ning J, et al. Theranostics. 2024 Aug 1;14(12):4570-4581. doi: 10.7150/thno.96921. eCollection 2024. Theranostics. 2024. PMID: 39239512 Free PMC article.
  • Accurate prediction of functional effect of single amino acid variants with deep learning.
    Derbel H, Zhao Z, Liu Q. Derbel H, et al. Comput Struct Biotechnol J. 2023 Nov 10;21:5776-5784. doi: 10.1016/j.csbj.2023.11.017. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 38074467 Free PMC article.
  • A Computational Approach: The Functional Effects of Thyroid Peroxidase Variants in Thyroid Cancer and Genetic Disorders.
    Sobitan A, Gebremedhin B, Yao Q, Xie G, Gu X, Li J, Teng S. Sobitan A, et al. JCO Clin Cancer Inform. 2024 Jan;8:e2300140. doi: 10.1200/CCI.23.00140. JCO Clin Cancer Inform. 2024. PMID: 38295322 Free PMC article.
  • Calibration of additional computational tools expands ClinGen recommendation options for variant classification with PP3/BP4 criteria.
    Bergquist T, Stenton SL, Nadeau EAW, Byrne AB, Greenblatt MS, Harrison SM, Tavtigian SV, O'Donnell-Luria A, Biesecker LG, Radivojac P, Brenner SE, Pejaver V; ClinGen Sequence Variant Interpretation Working Group. Bergquist T, et al. bioRxiv [Preprint]. 2024 Sep 21:2024.09.17.611902. doi: 10.1101/2024.09.17.611902. bioRxiv. 2024. PMID: 39345488 Free PMC article. Preprint.

References

    1. Van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020). - PubMed - PMC - DOI
    1. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). - PubMed - PMC - DOI
    1. Landrum, M. J. & Kattman, B. L. ClinVar at five years: delivering on the promise. Hum. Mutat. 39, 1623–1630 (2018). - PubMed - DOI
    1. Raimondi, D. et al. DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic Acids Res. 45, W201-W206 (2017). - PubMed - PMC - DOI
    1. Feng, B. J. PERCH: a unified framework for disease gene prioritization. Hum. Mutat. 38, 243–251 (2017). - PubMed - PMC - DOI

Publication types

LinkOut - more resources