Identifying Mendelian disease genes with the variant effect scoring tool

doi:10.1186/1471-2164-14-S3-S3

Comparative Study

. 2013;14 Suppl 3(Suppl 3):S3.

doi: 10.1186/1471-2164-14-S3-S3. Epub 2013 May 28.

Identifying Mendelian disease genes with the variant effect scoring tool

Hannah Carter¹, Christopher Douville, Peter D Stenson, David N Cooper, Rachel Karchin

Affiliations

PMID: 23819870
PMCID: PMC3665549
DOI: 10.1186/1471-2164-14-S3-S3

Comparative Study

Identifying Mendelian disease genes with the variant effect scoring tool

Hannah Carter et al. BMC Genomics. 2013.

. 2013;14 Suppl 3(Suppl 3):S3.

doi: 10.1186/1471-2164-14-S3-S3. Epub 2013 May 28.

Authors

Hannah Carter¹, Christopher Douville, Peter D Stenson, David N Cooper, Rachel Karchin

Affiliation

¹ Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, 3400 N, Charles St, Baltimore, Maryland USA.

PMID: 23819870
PMCID: PMC3665549
DOI: 10.1186/1471-2164-14-S3-S3

Abstract

Background: Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease.

Results: We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome.

Conclusions: Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is available as a stand-alone software package at http://wiki.chasmsoftware.org and is hosted by the CRAVAT web server at http://www.cravat.us.

PubMed Disclaimer

Figures

**Figure 1**
**VEST Classifier performance**. Receiver Operating Characteristic (left) and precision-recall curve (right) for VEST were constructed using 5-fold gene holdout cross validation on the VEST training set. The AUC statistics for these two curves were both 0.92 indicating that the VEST classifier has good sensitivity and specificity for identifying mutations with functional consequences for protein activity.

**Figure 2**
**Comparison of VEST with popular methods PolyPhen2 and SIFT4.0**. Receiver Operating Characteristic (left) and precision-recall curve (right) for VEST (A), PolyPhen2 (B) and SIFT4.0 (C). The color bar for SIFT is reversed since a low SIFT score corresponds to positive class prediction. ROC AUC is 0.92, 0.85, 0.84 for VEST, PolyPhen2 and SIFT respectively. PR AUC is 0.88, 0.76, 0.72 for VEST, PolyPhen2 and SIFT respectively.

**Figure 3**
**Power to detect disease genes in simulated cases of locus heterogeneity**. Estimated power to detect disease genes in the presence of locus heterogeneity when A) seven, three and one exomes share disease genes B) three, two and one exomes share disease genes C) ten and one exomes share disease genes D) each of four exomes results from a distinct disease gene. In each case gene p-values acquired using both Fisher's and Stouffer's methods are compared. Power is shown for raw p-values as well as Benjamini-Hochberg adjusted p-values. The height of each bar corresponds to the number of simulations in which the gene received a p-value or adjusted p-value <0.05.

**Figure 4**
**Comparison of VEST score distribution for three empirical null models**. Density plots created from VEST score distributions for three empirical null models representing neutral human missense variation. Null model mutations were filtered to remove overlap with the VEST training set, then scored with the VEST classifier. The Swissprot-based null shows an enrichment for large VEST scores in the right tail, indicating predicted functional mutations.

**Figure 5**
**Sensitivity of gene score to mutation count and fraction of functional mutations at different effect sizes**. Power to detect disease genes was estimated using simulations in R. Mutation counts and fraction of functional mutations were varied at four different effect sizes (0.5, 1.0, 1.5 and 2.0). A distinct plot represents the results of the simulation for each effect size. The legend on the top right shows the fraction of disease mutations simulated in each gene.

**Figure 6**
**Sensitivity of gene score to VEST classification error**. Power simulations were repeated with an additional parameter: VEST true positive rate (TPR). Four TPRs were selected based on VEST generalization error estimates. A set of simulation is shown for each of the four points (60%, 70%, 80% and 90%). As expected, power to detect disease genes decreases as the TPR decreases.

See this image and copyright information in PMC

Cited by

The first exome wide association study in Tunisia: identification of candidate loci and pathways with biological relevance for type 2 diabetes.
Dallali H, Boukhalfa W, Kheriji N, Fassatoui M, Jmel H, Hechmi M, Gouiza I, Gharbi M, Kammoun W, Mrad M, Taoueb M, Krir A, Trabelsi H, Bahlous A, Jamoussi H, Messaoud O, Abid A, Kefi R. Dallali H, et al. Front Endocrinol (Lausanne). 2023 Dec 19;14:1293124. doi: 10.3389/fendo.2023.1293124. eCollection 2023. Front Endocrinol (Lausanne). 2023. PMID: 38192426 Free PMC article.
Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data.
Kumar RD, Searleman AC, Swamidass SJ, Griffith OL, Bose R. Kumar RD, et al. Bioinformatics. 2015 Nov 15;31(22):3561-8. doi: 10.1093/bioinformatics/btv430. Epub 2015 Jul 25. Bioinformatics. 2015. PMID: 26209800 Free PMC article.
Characterization of CYP2D6 Pharmacogenetic Variation in Sub-Saharan African Populations.
Twesigomwe D, Drögemöller BI, Wright GEB, Adebamowo C, Agongo G, Boua PR, Matshaba M, Paximadis M, Ramsay M, Simo G, Simuunza MC, Tiemessen CT, Lombard Z, Hazelhurst S. Twesigomwe D, et al. Clin Pharmacol Ther. 2023 Mar;113(3):643-659. doi: 10.1002/cpt.2749. Epub 2022 Oct 21. Clin Pharmacol Ther. 2023. PMID: 36111505 Free PMC article.
Enhancing Missense Variant Pathogenicity Prediction with MissenseNet: Integrating Structural Insights and ShuffleNet-Based Deep Learning Techniques.
Liu J, Chen Y, Huang K, Guan X. Liu J, et al. Biomolecules. 2024 Sep 2;14(9):1105. doi: 10.3390/biom14091105. Biomolecules. 2024. PMID: 39334871 Free PMC article.
Targeted sequencing of the LRRTM gene family in suicide attempters with bipolar disorder.
Reichman RD, Gaynor SC, Monson ET, Gaine ME, Parsons MG, Zandi PP, Potash JB, Willour VL. Reichman RD, et al. Am J Med Genet B Neuropsychiatr Genet. 2020 Mar;183(2):128-139. doi: 10.1002/ajmg.b.32767. Epub 2019 Dec 19. Am J Med Genet B Neuropsychiatr Genet. 2020. PMID: 31854516 Free PMC article.

See all "Cited by" articles

References

1. Kryukov G, Pennacchio L, Sunyaev S. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics. 2007;80(4):727–739. doi: 10.1086/513473. - DOI - PMC - PubMed
1. Thusberg J, Vihinen M. Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Human mutation. 2009;30(5):703–714. doi: 10.1002/humu.20938. - DOI - PubMed
1. Cooper G, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Reviews Genetics. 2011;12(9):628–640. doi: 10.1038/nrg3046. - DOI - PubMed
1. Kumar P, Henikoff S, Ng P. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols. 2009;4(7):1073–1081. - PubMed
1. Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Research. 2011;39(17):e118–e118. doi: 10.1093/nar/gkr407. - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

CA 152432/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

[1] Kryukov G, Pennacchio L, Sunyaev S. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics. 2007;80(4):727–739. doi: 10.1086/513473. - DOI - PMC - PubMed

[2] Kryukov G, Pennacchio L, Sunyaev S. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics. 2007;80(4):727–739. doi: 10.1086/513473. - DOI - PMC - PubMed

[3] Thusberg J, Vihinen M. Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Human mutation. 2009;30(5):703–714. doi: 10.1002/humu.20938. - DOI - PubMed

[4] Thusberg J, Vihinen M. Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Human mutation. 2009;30(5):703–714. doi: 10.1002/humu.20938. - DOI - PubMed

[5] Cooper G, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Reviews Genetics. 2011;12(9):628–640. doi: 10.1038/nrg3046. - DOI - PubMed

[6] Cooper G, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Reviews Genetics. 2011;12(9):628–640. doi: 10.1038/nrg3046. - DOI - PubMed

[7] Kumar P, Henikoff S, Ng P. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols. 2009;4(7):1073–1081. - PubMed

[8] Kumar P, Henikoff S, Ng P. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols. 2009;4(7):1073–1081. - PubMed

[9] Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Research. 2011;39(17):e118–e118. doi: 10.1093/nar/gkr407. - DOI - PMC - PubMed

[10] Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Research. 2011;39(17):e118–e118. doi: 10.1093/nar/gkr407. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying Mendelian disease genes with the variant effect scoring tool

Affiliation

Identifying Mendelian disease genes with the variant effect scoring tool

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical