Assessment of molecular dynamics time series descriptors in protein-ligand affinity prediction.

Pawel Siedlecki,Jakub Poziemski,Artur Yurkevych
DOI: https://doi.org/10.26434/chemrxiv-2024-dxv36
2024-05-22
Abstract:The advancement of computational methods in drug discovery, particularly through the use of machine learning (ML) and deep learning (DL), has significantly enhanced the precision of binding affinity predictions. Despite progress in computer-aided drug discovery (CADD) accurate prediction of binding affinity remains a challenge due to the complex, non-linear character of molecular interactions. Generalizability continues to limit these models, with performance discrepancies noted between training datasets and external test conditions. This study explores the integration of molecular dynamics (MD) simulations with ML to assess its predictive performance and limitations. In particular MD simulations offer a dynamic perspective by depicting the temporal interactions within protein-ligand complexes, potentially bringing additional information for affinity and specificity estimates. By generating and analyzing over 800 unique protein-ligand MD simulations, we evaluate the utility of MD-derived descriptors based on time series in enhancing predictive accuracies. The findings suggest specific and generalizable features derived from MD data and propose approaches to augment the current in silico affinity prediction methods.
Chemistry
What problem does this paper attempt to address?
This paper mainly discusses the application of molecular dynamics time series descriptors in protein-ligand binding affinity prediction. The study points out that although computer-aided drug discovery (CADD) methods, especially machine learning (ML) and deep learning (DL), have improved the accuracy of binding affinity prediction, accurate prediction is still a challenge due to the complexity and nonlinearity of molecular interactions. The current models show variations in performance between training datasets and external test conditions, which limits their generalization ability. To overcome this problem, researchers generated over 800 unique protein-ligand dynamic data through molecular dynamics (MD) simulations and analyzed the time series features extracted from these data to evaluate their potential for enhancing prediction accuracy. MD simulations provide a dynamic perspective of protein-ligand complexes over time, contributing to the estimation of affinity and specificity. Through the analysis of a large amount of MD simulation data, the study found that MD-derived features can improve predictive performance and proposed a method to enhance current computer-aided affinity prediction methods. However, the study also indicated that the use of MD data may be target-specific and influenced by the ratio of noise to signal, such as simulation frame number and MD simulation length. In summary, the paper attempts to address how to use the nonlinear features in MD data to enhance the accuracy of binding affinity prediction and whether these features can identify new features that contribute to prediction. Through large-scale experiments and analysis, the researchers proposed a strategy to optimize the performance of machine learning models by carefully selecting and filtering features in MD data, thereby improving affinity prediction in the drug discovery process.