Prediction of Thermostability of Enzymes Based on the Amino Acid Index (AAindex) Database and Machine Learning

Gaolin Li,Lili Jia,Kang Wang,Tingting Sun,Jun Huang
DOI: https://doi.org/10.3390/molecules28248097
IF: 4.6
2023-12-15
Molecules
Abstract:The combination of wet-lab experimental data on multi-site combinatorial mutations and machine learning is an innovative method in protein engineering. In this study, we used an innovative sequence-activity relationship (innov’SAR) methodology based on novel descriptors and digital signal processing (DSP) to construct a predictive model. In this paper, 21 experimental (R)-selective amine transaminases from Aspergillus terreus (AT-ATA) were used as an input to predict higher thermostability mutants than those predicted using the existing data. We successfully improved the coefficient of determination (R2) of the model from 0.66 to 0.92. In addition, root-mean-squared deviation (RMSD), root-mean-squared fluctuation (RMSF), solvent accessible surface area (SASA), hydrogen bonds, and the radius of gyration were estimated based on molecular dynamics simulations, and the differences between the predicted mutants and the wild-type (WT) were analyzed. The successful application of the innov’SAR algorithm in improving the thermostability of AT-ATA may help in directed evolutionary screening and open up new avenues for protein engineering.
chemistry, multidisciplinary,biochemistry & molecular biology
What problem does this paper attempt to address?
This paper mainly explores how to predict the thermal stability of enzymes using the Amino Acid Index (AAindex) database and machine learning techniques. In the study, the authors proposed an innovative sequence-activity relationship (innov'SAR) method to construct the prediction model using digital signal processing (DSP) and novel descriptors. Taking the (R)-selective amine transaminase (AT-ATA) obtained from 21 experiments as an example, they attempted to predict mutants with higher thermal stability than existing data. With this approach, the coefficient of determination (R2) of the model increased from 0.66 to 0.92, improving the prediction accuracy. In addition, the paper used molecular dynamics simulations to analyze the differences between predicted mutants and the wild type (WT), including root mean square deviation (RMSD), root mean square fluctuation (RMSF), solvent accessible surface area (SASA), hydrogen bonds, and rotational inertia radius. These analyses help understand the structural changes of predicted mutants and their contributions to thermal stability. The paper points out that although directed evolution is a commonly used protein engineering method, the process of generating and screening mutation libraries is time-consuming and labor-intensive. Therefore, through machine learning and the innov'SAR algorithm, it is possible to design and optimize the thermal stability of enzymes more efficiently, opening up new avenues for protein engineering. This research is of great significance for improving the thermal stability of AT-ATA, especially in the applications of pharmaceuticals, fine chemicals, and agrochemical synthesis.