PON-Sol2: Prediction of Effects of Variants on Protein Solubility
- PMID: 34360790
- PMCID: PMC8348231
- DOI: 10.3390/ijms22158027
PON-Sol2: Prediction of Effects of Variants on Protein Solubility
Abstract
Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein-solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.
Keywords: PON-Sol2; artificial intelligence; machine learning; mutation; prediction; protein solubility prediction; variation; variation interpretation.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Similar articles
-
PON-Sol: prediction of effects of amino acid substitutions on protein solubility.Bioinformatics. 2016 Jul 1;32(13):2032-4. doi: 10.1093/bioinformatics/btw066. Epub 2016 Feb 19. Bioinformatics. 2016. PMID: 27153720
-
PON-P2: prediction method for fast and reliable identification of harmful variants.PLoS One. 2015 Feb 3;10(2):e0117380. doi: 10.1371/journal.pone.0117380. eCollection 2015. PLoS One. 2015. PMID: 25647319 Free PMC article.
-
PON-Fold: Prediction of Substitutions Affecting Protein Folding Rate.Int J Mol Sci. 2023 Aug 21;24(16):13023. doi: 10.3390/ijms241613023. Int J Mol Sci. 2023. PMID: 37629203 Free PMC article.
-
Bioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction.Brief Bioinform. 2014 Nov;15(6):953-62. doi: 10.1093/bib/bbt057. Epub 2013 Aug 7. Brief Bioinform. 2014. PMID: 23926206 Review.
-
Protein aggregation and amyloid fibril formation prediction software from primary sequence: towards controlling the formation of bacterial inclusion bodies.FEBS J. 2011 Jul;278(14):2428-35. doi: 10.1111/j.1742-4658.2011.08164.x. Epub 2011 May 31. FEBS J. 2011. PMID: 21569208 Review.
Cited by
-
Developability assessment at early-stage discovery to enable development of antibody-derived therapeutics.Antib Ther. 2022 Nov 11;6(1):13-29. doi: 10.1093/abt/tbac029. eCollection 2023 Jan. Antib Ther. 2022. PMID: 36683767 Free PMC article. Review.
-
Machine Learning-Guided Protein Engineering.ACS Catal. 2023 Oct 13;13(21):13863-13895. doi: 10.1021/acscatal.3c02743. eCollection 2023 Nov 3. ACS Catal. 2023. PMID: 37942269 Free PMC article. Review.
-
Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation.Comput Biol Med. 2024 Feb;169:107918. doi: 10.1016/j.compbiomed.2024.107918. Epub 2024 Jan 3. Comput Biol Med. 2024. PMID: 38194782
-
SoluProtMutDB: A manually curated database of protein solubility changes upon mutations.Comput Struct Biotechnol J. 2022 Nov 9;20:6339-6347. doi: 10.1016/j.csbj.2022.11.009. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 36420168 Free PMC article.
-
Accelerating therapeutic protein design with computational approaches toward the clinical stage.Comput Struct Biotechnol J. 2023 Apr 29;21:2909-2926. doi: 10.1016/j.csbj.2023.04.027. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 38213894 Free PMC article. Review.
References
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous