Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 15;28(24):8097.
doi: 10.3390/molecules28248097.

Prediction of Thermostability of Enzymes Based on the Amino Acid Index (AAindex) Database and Machine Learning

Affiliations

Prediction of Thermostability of Enzymes Based on the Amino Acid Index (AAindex) Database and Machine Learning

Gaolin Li et al. Molecules. .

Abstract

The combination of wet-lab experimental data on multi-site combinatorial mutations and machine learning is an innovative method in protein engineering. In this study, we used an innovative sequence-activity relationship (innov'SAR) methodology based on novel descriptors and digital signal processing (DSP) to construct a predictive model. In this paper, 21 experimental (R)-selective amine transaminases from Aspergillus terreus (AT-ATA) were used as an input to predict higher thermostability mutants than those predicted using the existing data. We successfully improved the coefficient of determination (R2) of the model from 0.66 to 0.92. In addition, root-mean-squared deviation (RMSD), root-mean-squared fluctuation (RMSF), solvent accessible surface area (SASA), hydrogen bonds, and the radius of gyration were estimated based on molecular dynamics simulations, and the differences between the predicted mutants and the wild-type (WT) were analyzed. The successful application of the innov'SAR algorithm in improving the thermostability of AT-ATA may help in directed evolutionary screening and open up new avenues for protein engineering.

Keywords: artificial intelligence; directed evolution; extended sequence; machine learning; molecular dynamics simulation; thermostability.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The overall flowchart of the innov’SAR method and molecular dynamics simulation.
Figure 2
Figure 2
Thermal stability plots of measured and predicted half-lives of AT-ATA variants. (a) Use of a single index: R2 = 0.81. (b) The optimal combination of indices: R2 = 0.86. (c) Connection of the top five single-index combinations in series: R2 = 0.80.
Figure 3
Figure 3
Flow chart of the iterative process of successive concatenation. Each round uses the indices of the previous iteration as the basis for the extended sequence and determines the best index to retain for the current round by evaluating the performance of the model.
Figure 4
Figure 4
Plot for experimental versus predicted thermal stability of AT-ATA variants. The graph was plotted using iterative connect indices (AURR980108-MEIH800103-CORJ870104): R2 = 0.96.
Figure 5
Figure 5
Half-lives of the 1024 possible variants of AT-ATA. (■): half-life measured for WT, (◆): half-life measured for the best experimental mutant F115L_L118T, (▲): half-life measured for the remaining single and multi-site mutants, (●): predicting half-life of all 1024 possible variants.
Figure 6
Figure 6
RMSD values of P1, P2, F115L_L118T, and WT in 100 ns simulations.
Figure 7
Figure 7
MD analysis of P1, P2, F115L_L118T, and WT using YASARA at 313 K in the last 20 ns. (a) RMSF of P1-A, P2-A, F115L_L118T-A, and WT-A. (b) RMSF of P1-B, P2-B, F115L_L118T-B, and WT-B.
Figure 8
Figure 8
SASA values of P1, P2, F115L_L118T, and WT in 100 ns simulations.
Figure 9
Figure 9
The number of hydrogen bonds between the A and B chains of the P1, P2, F115L_L118T, and WT in 100 ns simulations.
Figure 10
Figure 10
The radius of gyration values of P1, P2, F115L_L118T, and WT in 100 ns simulations.
Figure 11
Figure 11
Schematic diagram of the innov’SAR method with extended sequences.

Similar articles

References

    1. Romero P.A., Arnold F.H. Exploring Protein Fitness Landscapes by Directed Evolution. Nat. Rev. Mol. Cell Biol. 2009;10:866–876. doi: 10.1038/nrm2805. - DOI - PMC - PubMed
    1. Packer M.S., Liu D.R. Methods for the Directed Evolution of Proteins. Nat. Rev. Genet. 2015;16:379–394. doi: 10.1038/nrg3927. - DOI - PubMed
    1. Reetz M.T. Directed Enzyme Evolution: Advances and Applications. Springer; Cham, Switzerland: 2017. Recent Advances in Directed Evolution of Stereoselective Enzymes; pp. 69–99. - DOI
    1. Reetz M.T. Biocatalysis in Organic Chemistry and Biotechnology: Past, Present, and Future. J. Am. Chem. Soc. 2013;135:12480–12496. doi: 10.1021/ja405051f. - DOI - PubMed
    1. Cen Y., Singh W., Arkin M., Moody T.S., Huang M., Zhou J., Wu Q., Reetz M.T. Artificial Cysteine-Lipases with High Activity and Altered Catalytic Mechanism Created by Laboratory Evolution. Nat. Commun. 2019;10:3198–4208. doi: 10.1038/s41467-019-11155-3. - DOI - PMC - PubMed

Grants and funding

This research was financially supported by the National Natural Science Foundation of China (Grant nos. 20904047, 21673207, 21873087), the Natural Science Foundation of Zhejiang Province (Grant nos. LY17A040001) and the ZUST Postgraduate Research and Innovation Fund (2022yjskc22).