PON-P2: prediction method for fast and reliable identification of harmful variants

doi:10.1371/journal.pone.0117380

. 2015 Feb 3;10(2):e0117380.

doi: 10.1371/journal.pone.0117380. eCollection 2015.

PON-P2: prediction method for fast and reliable identification of harmful variants

Abhishek Niroula¹, Siddhaling Urolagin¹, Mauno Vihinen¹

Affiliations

PMID: 25647319
PMCID: PMC4315405
DOI: 10.1371/journal.pone.0117380

PON-P2: prediction method for fast and reliable identification of harmful variants

Abhishek Niroula et al. PLoS One. 2015.

. 2015 Feb 3;10(2):e0117380.

doi: 10.1371/journal.pone.0117380. eCollection 2015.

Authors

Abhishek Niroula¹, Siddhaling Urolagin¹, Mauno Vihinen¹

Affiliation

¹ Department of Experimental Medical Science, Lund University, Lund, Sweden.

PMID: 25647319
PMCID: PMC4315405
DOI: 10.1371/journal.pone.0117380

Abstract

More reliable and faster prediction methods are needed to interpret enormous amounts of data generated by sequencing and genome projects. We have developed a new computational tool, PON-P2, for classification of amino acid substitutions in human proteins. The method is a machine learning-based classifier and groups the variants into pathogenic, neutral and unknown classes, on the basis of random forest probability score. PON-P2 is trained using pathogenic and neutral variants obtained from VariBench, a database for benchmark variation datasets. PON-P2 utilizes information about evolutionary conservation of sequences, physical and biochemical properties of amino acids, GO annotations and if available, functional annotations of variation sites. Extensive feature selection was performed to identify 8 informative features among altogether 622 features. PON-P2 consistently showed superior performance in comparison to existing state-of-the-art tools. In 10-fold cross-validation test, its accuracy and MCC are 0.90 and 0.80, respectively, and in the independent test, they are 0.86 and 0.71, respectively. The coverage of PON-P2 is 61.7% in the 10-fold cross-validation and 62.1% in the test dataset. PON-P2 is a powerful tool for screening harmful variants and for ranking and prioritizing experimental characterization. It is very fast making it capable of analyzing large variant datasets. PON-P2 is freely available at http://structure.bmc.lu.se/PON-P2/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of PON-P2 architecture and implementation.**
PON-P2 uses pre-calculated feature vectors and bootstrap random forest for prediction. In addition, it makes benefit of information about functional and/or structural annotations, when available, and identifies reliably predicted variations and groups them either as pathogenic or neutral.

**Fig 2. Distribution of variations at functional and structural sites.**
The pathogenic variations are represented by white bars and neutral variations by grey bars. The functional and structural annotation sites were obtained from Swiss-Prot and PDB. Binding, binding site; Metal, metal binding site; Active, active site; IM, intra membrane region; Site, catalytic, co-factor, anti-codon, regulatory or other essential site surrounding ligands in the structure.

**Fig 3. Performance cuboids for PON-P2 and other methods.**
Six performance measures: PPV, NPV, sensitivity, specificity, acc (accuracy) and normalized MCC (nMCC = MCC×0.5+0.5) for each method are represented by the distances of the six faces of the cuboid from the origin. (A) Performance cuboids for different feature subsets used in PON-P2. Seq prof, Proportions of reference and altered amino acids and number of sequences in multiple sequence alignment; Sel pres + Seq prof, evolutionary features; Sel pres + Seq prof + GO, evolutionary features and GO annotations (B) Performance cuboids for PolyPhen-2, PON-P, PON-P2 and SIFT for all predicted variations by each method on independent test dataset. The performance scores for PON-P and PON-P2 are for predictions at 0.95 confidence level. OPMs for PolyPhen-2, PON-P, PON-P2 and SIFT are 0.41, 0.61, 0.63 and 0.40, respectively. (C) Performance cuboids for predictors using c95-test set. OPMs for PolyPhen-2, PON-P, PON-P2 and SIFT are 0.47, 0.61, 0.63 and 0.48, respectively.

See this image and copyright information in PMC

Cited by

RheoScale: A tool to aggregate and quantify experimentally determined substitution outcomes for multiple variants at individual protein positions.
Hodges AM, Fenton AW, Dougherty LL, Overholt AC, Swint-Kruse L. Hodges AM, et al. Hum Mutat. 2018 Dec;39(12):1814-1826. doi: 10.1002/humu.23616. Epub 2018 Aug 28. Hum Mutat. 2018. PMID: 30117637 Free PMC article.
Impact of Deleterious Mutations on Structure, Function and Stability of Serum/Glucocorticoid Regulated Kinase 1: A Gene to Diseases Correlation.
AlAjmi MF, Khan S, Choudhury A, Mohammad T, Noor S, Hussain A, Lu W, Eapen MS, Chimankar V, Hansbro PM, Sohal SS, Elasbali AM, Hassan MI. AlAjmi MF, et al. Front Mol Biosci. 2021 Nov 3;8:780284. doi: 10.3389/fmolb.2021.780284. eCollection 2021. Front Mol Biosci. 2021. PMID: 34805284 Free PMC article.
PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality.
Yang Y, Urolagin S, Niroula A, Ding X, Shen B, Vihinen M. Yang Y, et al. Int J Mol Sci. 2018 Mar 28;19(4):1009. doi: 10.3390/ijms19041009. Int J Mol Sci. 2018. PMID: 29597263 Free PMC article.
PMut: a web-based tool for the annotation of pathological variants on proteins, 2017 update.
López-Ferrando V, Gazzo A, de la Cruz X, Orozco M, Gelpí JL. López-Ferrando V, et al. Nucleic Acids Res. 2017 Jul 3;45(W1):W222-W228. doi: 10.1093/nar/gkx313. Nucleic Acids Res. 2017. PMID: 28453649 Free PMC article.
PON-SC - program for identifying steric clashes caused by amino acid substitutions.
Čalyševa J, Vihinen M. Čalyševa J, et al. BMC Bioinformatics. 2017 Nov 29;18(1):531. doi: 10.1186/s12859-017-1947-7. BMC Bioinformatics. 2017. PMID: 29187139 Free PMC article.

See all "Cited by" articles

References

1. Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, et al. (2010) Clinical assessment incorporating a personal genome. Lancet 375: 1525–1535. 10.1016/S0140-6736(10)60452-7 - DOI - PMC - PubMed
1. Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB (2011) Bioinformatics challenges for personalized medicine. Bioinformatics 27: 1741–1748. 10.1093/bioinformatics/btr295 - DOI - PMC - PubMed
1. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308–311. - PMC - PubMed
1. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, et al. (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39: D945–950. 10.1093/nar/gkq929 - DOI - PMC - PubMed
1. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. 10.1038/nature09534 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

MV received funding from Faculty of Medicine, Lund University (http://www.med.lu.se/english). MV received funding from Vetenskapsrådet (http://www.vr.se/inenglish.4.12fff4451215cbd83e4800015152.html). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

[1] Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, et al. (2010) Clinical assessment incorporating a personal genome. Lancet 375: 1525–1535. 10.1016/S0140-6736(10)60452-7 - DOI - PMC - PubMed

[2] Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, et al. (2010) Clinical assessment incorporating a personal genome. Lancet 375: 1525–1535. 10.1016/S0140-6736(10)60452-7 - DOI - PMC - PubMed

[3] Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB (2011) Bioinformatics challenges for personalized medicine. Bioinformatics 27: 1741–1748. 10.1093/bioinformatics/btr295 - DOI - PMC - PubMed

[4] Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB (2011) Bioinformatics challenges for personalized medicine. Bioinformatics 27: 1741–1748. 10.1093/bioinformatics/btr295 - DOI - PMC - PubMed

[5] Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308–311. - PMC - PubMed

[6] Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308–311. - PMC - PubMed

[7] Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, et al. (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39: D945–950. 10.1093/nar/gkq929 - DOI - PMC - PubMed

[8] Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, et al. (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39: D945–950. 10.1093/nar/gkq929 - DOI - PMC - PubMed

[9] Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. 10.1038/nature09534 - DOI - PMC - PubMed

[10] Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. 10.1038/nature09534 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PON-P2: prediction method for fast and reliable identification of harmful variants

Affiliation

PON-P2: prediction method for fast and reliable identification of harmful variants

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous