Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 27;22(15):8027.
doi: 10.3390/ijms22158027.

PON-Sol2: Prediction of Effects of Variants on Protein Solubility

Affiliations

PON-Sol2: Prediction of Effects of Variants on Protein Solubility

Yang Yang et al. Int J Mol Sci. .

Abstract

Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein-solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.

Keywords: PON-Sol2; artificial intelligence; machine learning; mutation; prediction; protein solubility prediction; variation; variation interpretation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Scheme for two-layer three-class classifier.
Figure 2
Figure 2
Predicted solubility and disease-related variations in BTK kinase domain (PDB id 5p9j (69), covalent inhibitor at the ATP binding site is in yellow. (A) Numbers of variations increasing solubility, (B) numbers of variations having no effect on solubility, (C) numbers of variants decreasing solubility, and (D) numbers of XLA-causing variants. Predictions were made for all 19 single amino acid substitutions at every position. bPathogenicity-related variants were predicted with PON-P2. Keys in the bottom show the numbers of variants predicted to have the effect.

Similar articles

Cited by

References

    1. Shihab H.A., Gough J., Cooper D.N., Stenson P.D., Barker G., Edwards K.J., Day I.N.M., Gaunt T.R. Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat. 2012;34:57–65. doi: 10.1002/humu.22225. - DOI - PMC - PubMed
    1. Dong C., Wei P., Jian X., Gibbs R., Boerwinkle E., Wang K., Liu X. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum. Mol. Genet. 2015;24:2125–2137. doi: 10.1093/hmg/ddu733. - DOI - PMC - PubMed
    1. Carter H., Douville C., Stenson P.D., Cooper D.N., Karchin R. Identifying Mendelian disease genes with the Variant Effect Scoring Tool. BMC Genom. 2013;14:S3. doi: 10.1186/1471-2164-14-S3-S3. - DOI - PMC - PubMed
    1. Niroula A., Urolagin S., Vihinen M. PON-P2: Prediction Method for Fast and Reliable Identification of Harmful Variants. PLoS ONE. 2015;10:e0117380. doi: 10.1371/journal.pone.0117380. - DOI - PMC - PubMed
    1. Chiti F., Dobson C.M. Protein Misfolding, Amyloid Formation, and Human Disease: A Summary of Progress over the Last Decade. Annu. Rev. Biochem. 2017;86:27–68. doi: 10.1146/annurev-biochem-061516-045115. - DOI - PubMed

Substances

LinkOut - more resources