Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 30;8(10):e77945.
doi: 10.1371/journal.pone.0077945. eCollection 2013.

CanDrA: cancer-specific driver missense mutation annotation with optimized features

Affiliations

CanDrA: cancer-specific driver missense mutation annotation with optimized features

Yong Mao et al. PLoS One. .

Abstract

Driver mutations are somatic mutations that provide growth advantage to tumor cells, while passenger mutations are those not functionally related to oncogenesis. Distinguishing drivers from passengers is challenging because drivers occur much less frequently than passengers, they tend to have low prevalence, their functions are multifactorial and not intuitively obvious. Missense mutations are excellent candidates as drivers, as they occur more frequently and are potentially easier to identify than other types of mutations. Although several methods have been developed for predicting the functional impact of missense mutations, only a few have been specifically designed for identifying driver mutations. As more mutations are being discovered, more accurate predictive models can be developed using machine learning approaches that systematically characterize the commonality and peculiarity of missense mutations under the background of specific cancer types. Here, we present a cancer driver annotation (CanDrA) tool that predicts missense driver mutations based on a set of 95 structural and evolutionary features computed by over 10 functional prediction algorithms such as CHASM, SIFT, and MutationAssessor. Through feature optimization and supervised training, CanDrA outperforms existing tools in analyzing the glioblastoma multiforme and ovarian carcinoma data sets in The Cancer Genome Atlas and the Cancer Cell Line Encyclopedia project.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Feature optimization for GBM. Plotted are the areas under the curves (AUCs) of the receiver operator characteristics acquired through our incremental feature selection process.
Three sets of AUCs are computed from the 10-fold cross-validation (CV) of the training set GBM.Ex (dotted line) and the independent validation (IV) of 2 test sets, GBM.S1 and GBM.S2 (solid and dashed line). On the x-axis are features that are incrementally selected. The dashed box marks the peaks of the cross-validation AUC, which corresponds to the optimal feature set used for CanDrA.
Figure 2
Figure 2. Feature optimization for OVC. Plotted are the areas under the curves (AUCs) of the receiver operator characteristics acquired through our incremental feature selection process.
Three sets of AUCs are computed from the 10-fold cross-validation (CV) of the training set OVC.Ex (dotted line) and the independent validation (IV) of 2 test sets, OVC.S1 and OVC.S2 (solid and dashed line). On the x-axis are features that are incrementally selected. The dashed box marks the peaks of the cross-validation AUC, which corresponds to the optimal feature set used for CanDrA.
Figure 3
Figure 3. Correlation between mutation score and prevalence.
Twelve algorithms (x-axis) were compared using 4 data sets: (a) GBM mutations in TP53, (b) GBM mutations in PTEN, (c) OVC mutations in TP53, and (d) OVC mutations in KRAS.
Figure 4
Figure 4. Comparison between synthetic passenger mutations (PMs) and real PMs.
Plotted are the Mutation Assessor variant specificity scores of sets of synthetic PMs (generated by CHASM), CCLE PMs, TCGA PMs and driver mutations from the 4 test sets in Table 1, for GBM (a) and OVC (b), respectively. Significant differences (Mann-Whitney U test) between two score distributions are indicated with P values reported.

Similar articles

Cited by

References

    1. Bozic I, Antal T, Ohtsuki H, Carter H, Kim D, et al. (2010) Accumulation of driver and passenger mutations during tumor progression. Proc Natl Acad Sci U S A 107: 18545–18550. - PMC - PubMed
    1. Fearon ER, Vogelstein B (1990) A genetic model for colorectal tumorigenesis. Cell 61: 759–767. - PubMed
    1. Tabin CJ, Bradley SM, Bargmann CI, Weinberg RA, Papageorge AG, et al. (1982) Mechanism of activation of a human oncogene. Nature 300: 143–149. - PubMed
    1. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature 446: 153–158. - PMC - PubMed
    1. Hodis E, Watson IR, Kryukov GV, Arold ST, Imielinski M, et al. (2012) A landscape of driver mutations in melanoma. Cell 150: 251–263. - PMC - PubMed

Publication types