Machine Learning Integrating Protein Structure, Sequence, and Dynamics to Predict the Enzyme Activity of Bovine Enterokinase Variants

Niccolo Alberto Elia Venanzi,Andrea Basciu,Attilio Vittorio Vargiu,Alexandros Kiparissides,Paul A. Dalby,Duygu Dikicioglu
DOI: https://doi.org/10.1021/acs.jcim.3c00999
IF: 6.162
2024-02-22
Journal of Chemical Information and Modeling
Abstract:Despite recent advances in computational protein science, the dynamic behavior of proteins, which directly governs their biological activity, cannot be gleaned from sequence information alone. To overcome this challenge, we propose a framework that integrates the peptide sequence, protein structure, and protein dynamics descriptors into machine learning algorithms to enhance their predictive capabilities and achieve improved prediction of the protein variant function. The resulting machine learning pipeline integrates traditional sequence and structure information with molecular dynamics simulation data to predict the effects of multiple point mutations on the fold improvement of the activity of bovine enterokinase variants. This study highlights how the combination of structural and dynamic data can provide predictive insights into protein functionality and address protein engineering challenges in industrial contexts.
chemistry, multidisciplinary, medicinal,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?
This paper mainly discusses how to predict enzyme activity using machine learning combined with protein structure, sequence, and dynamic information, especially for the activity of mutant variants of cow intestine kinase. In the study, the authors proposed a framework that integrates traditional sequence and structure information with molecular dynamics simulation data to enhance prediction capabilities and improve the accuracy of predicting protein variant function. The paper first points out that although there have been many advances in protein science, understanding the dynamic behavior of proteins solely based on sequence information is crucial for their biological activity. To overcome this challenge, the researchers integrated protein sequence, structure, and dynamical descriptors to predict the impact of multi-site mutations on the folding-improving activity of mutant variants of cow intestine kinase using machine learning algorithms. The research methodology includes using molecular dynamics simulation data, combined with traditional sequence and structure information, to predict the impact of amino acid substitutions on protein performance. Specifically, they conducted experiments on 312 mutant variants of cow intestine kinase, applied various machine learning models to predict the function of each variant, and compared the predictions with experimental data to evaluate the performance of the models. Furthermore, the study emphasizes the importance of integrating structure and dynamic data in understanding protein function and demonstrates the potential application of this approach in protein engineering to accelerate and optimize the protein design process. By analyzing key biological descriptors, the researchers were able to explain the critical factors in the model's predicted function and validate the effects of specific point mutations on the protein sequence. In the paper, the authors also discuss the challenges and opportunities of using simulation-based data as input for machine learning algorithms and provide strategies for effectively integrating information from different levels to successfully predict protein variant function. Finally, they determine the best predictive model through statistical analysis and machine learning algorithm selection, as well as how to handle the variability of repeated simulation experiments to ensure model stability and prediction accuracy.