Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2015 Apr 15;24(8):2125-37.
doi: 10.1093/hmg/ddu733. Epub 2014 Dec 30.

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies

Affiliations
Comparative Study

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies

Chengliang Dong et al. Hum Mol Genet. .

Abstract

Accurate deleteriousness prediction for nonsynonymous variants is crucial for distinguishing pathogenic mutations from background polymorphisms in whole exome sequencing (WES) studies. Although many deleteriousness prediction methods have been developed, their prediction results are sometimes inconsistent with each other and their relative merits are still unclear in practical applications. To address these issues, we comprehensively evaluated the predictive performance of 18 current deleteriousness-scoring methods, including 11 function prediction scores (PolyPhen-2, SIFT, MutationTaster, Mutation Assessor, FATHMM, LRT, PANTHER, PhD-SNP, SNAP, SNPs&GO and MutPred), 3 conservation scores (GERP++, SiPhy and PhyloP) and 4 ensemble scores (CADD, PON-P, KGGSeq and CONDEL). We found that FATHMM and KGGSeq had the highest discriminative power among independent scores and ensemble scores, respectively. Moreover, to ensure unbiased performance evaluation of these prediction scores, we manually collected three distinct testing datasets, on which no current prediction scores were tuned. In addition, we developed two new ensemble scores that integrate nine independent scores and allele frequency. Our scores achieved the highest discriminative power compared with all the deleteriousness prediction scores tested and showed low false-positive prediction rate for benign yet rare nonsynonymous variants, which demonstrated the value of combining information from multiple orthologous approaches. Finally, to facilitate variant prioritization in WES studies, we have pre-computed our ensemble scores for 87 347 044 possible variants in the whole-exome and made them publicly available through the ANNOVAR software and the dbNSFP database.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
ROC curves for existing prediction scores and our ensemble scores. These two plots illustrated performance of quantitative prediction outcomes for existing prediction scores and our ensemble prediction scores evaluated by the ROC curve and AUC score for the ROC curve. Higher AUC score indicates better performance. Top plot used testing dataset I as benchmark dataset and bottom plot used testing dataset II as benchmark dataset (see Table 1). 95% CI indicates 95% confidence interval computed with 2000 stratified bootstrap replicates.
Figure 2.
Figure 2.
Sensitivity and specificity plots for existing prediction scores and our ensemble scores. These two plots illustrated the performance of qualitative prediction outcomes of existing prediction scores and our ensemble prediction scores, evaluated by sensitivity and specificity. Higher sensitivity/specificity score indicates better performance. Top plot used testing dataset I as benchmark dataset and bottom plot used testing dataset II as benchmark dataset (see Table 1). The tegend table showed various qualitative prediction performance measurements for each prediction tool. MCC is a correlation coefficient between the observed and predicted binary classification, ranging from −1 to 1, where 1 indicates perfect prediction, −1 indicates total disagreement between prediction and observation. MCC=(TP×TNFP×FN)/(TP + FP)(TP + FN)(TN + FP)(TN + FN), where TP, TN, FP and FN denotes true positive, true negative, false positive and false negative, respectively. ACC denotes accuracy; ACC=((TP+TN)/(TP+FP+TN+FN)). TPR denotes true positive rate, or sensitivity; TPR=(TP/(TP+FN)). TNR denotes true negative rate, or specificity; TNR=(TN/(TN+FP)). FPR denotes false-positive rate; FPR=(FP/(TN+FP)). FNR denotes false-negative rate; FNR=(FN/(TP+FN)). PPV denotes positive predictive value; PPV=(TP/(TP+FP)). NPV denotes negative predictive value; NPV=(TN/(TN+FN)). FDR denotes false discovery rate; FDR=(FP/(FP+TP)). For each qualitative prediction performance measurement, top three performance scores were highlighted. The brighter the highlight color, the better the performance.

Similar articles

Cited by

References

    1. Ng S.B., Nickerson D.A., Bamshad M.J., Shendure J. (2010) Massively parallel sequencing and rare disease. Hum. Mol. Genet., 19, R119–R124. - PMC - PubMed
    1. Reva B., Antipin Y., Sander C. (2011) Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res., 39, e118. - PMC - PubMed
    1. Ng P.C., Henikoff S. (2006) Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genomics Hum. Genet., 7, 61–80. - PubMed
    1. Thusberg J., Vihinen M. (2009) Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Hum. Mutat., 30, 703–714. - PubMed
    1. Cooper G.M., Goode D.L., Ng S.B., Sidow A., Bamshad M.J., Shendure J., Nickerson D.A. (2010) Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat. Methods, 7, 250–251. - PMC - PubMed

Publication types