Disease variant prediction with deep generative models of evolutionary data
- PMID: 34707284
- DOI: 10.1038/s41586-021-04043-8
Disease variant prediction with deep generative models of evolutionary data
Erratum in
-
Publisher Correction: Disease variant prediction with deep generative models of evolutionary data.Nature. 2022 Jan;601(7892):E7. doi: 10.1038/s41586-021-04207-6. Nature. 2022. PMID: 34921310 No abstract available.
Abstract
Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences1-3. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods4-10 have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable11. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification12-16. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.
© 2021. The Author(s), under exclusive licence to Springer Nature Limited.
Comment in
-
Predicting disease variants using biodiversity and machine learning.Nat Biotechnol. 2022 Jan;40(1):27-28. doi: 10.1038/s41587-021-01187-w. Nat Biotechnol. 2022. PMID: 34949779 No abstract available.
Similar articles
-
Deep generative models of LDLR protein structure to predict variant pathogenicity.J Lipid Res. 2023 Dec;64(12):100455. doi: 10.1016/j.jlr.2023.100455. Epub 2023 Oct 11. J Lipid Res. 2023. PMID: 37821076 Free PMC article.
-
Variant Interpretation: Functional Assays to the Rescue.Am J Hum Genet. 2017 Sep 7;101(3):315-325. doi: 10.1016/j.ajhg.2017.07.014. Am J Hum Genet. 2017. PMID: 28886340 Free PMC article.
-
Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan.Mol Biol Evol. 2021 Jan 4;38(1):318-328. doi: 10.1093/molbev/msaa204. Mol Biol Evol. 2021. PMID: 32770229 Free PMC article.
-
Biophysical and Mechanistic Models for Disease-Causing Protein Variants.Trends Biochem Sci. 2019 Jul;44(7):575-588. doi: 10.1016/j.tibs.2019.01.003. Epub 2019 Jan 31. Trends Biochem Sci. 2019. PMID: 30712981 Free PMC article. Review.
-
Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests.Hum Mutat. 2017 Sep;38(9):1072-1084. doi: 10.1002/humu.23266. Epub 2017 Jun 21. Hum Mutat. 2017. PMID: 28544059 Free PMC article. Review.
Cited by
-
Ensembl 2023.Nucleic Acids Res. 2023 Jan 6;51(D1):D933-D941. doi: 10.1093/nar/gkac958. Nucleic Acids Res. 2023. PMID: 36318249 Free PMC article.
-
A novel assessment of whole-mount Gleason grading in prostate cancer to identify candidates for radical prostatectomy: a machine learning-based multiomics study.Theranostics. 2024 Aug 1;14(12):4570-4581. doi: 10.7150/thno.96921. eCollection 2024. Theranostics. 2024. PMID: 39239512 Free PMC article.
-
Accurate prediction of functional effect of single amino acid variants with deep learning.Comput Struct Biotechnol J. 2023 Nov 10;21:5776-5784. doi: 10.1016/j.csbj.2023.11.017. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 38074467 Free PMC article.
-
A Computational Approach: The Functional Effects of Thyroid Peroxidase Variants in Thyroid Cancer and Genetic Disorders.JCO Clin Cancer Inform. 2024 Jan;8:e2300140. doi: 10.1200/CCI.23.00140. JCO Clin Cancer Inform. 2024. PMID: 38295322 Free PMC article.
-
Calibration of additional computational tools expands ClinGen recommendation options for variant classification with PP3/BP4 criteria.bioRxiv [Preprint]. 2024 Sep 21:2024.09.17.611902. doi: 10.1101/2024.09.17.611902. bioRxiv. 2024. PMID: 39345488 Free PMC article. Preprint.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials