Guided docking as a data generation approach facilitates structure-based machine learning on kinases

Joschka Groß,Michael Backenköhler,Andrea Volkamer,Verena Wolf
DOI: https://doi.org/10.26434/chemrxiv-2023-prk53-v2
2024-04-12
Abstract:Drug discovery pipelines nowadays rely on machine learning models to explore and evaluate large chemical spaces. While including 3D structural information is considered beneficial, structural models are hindered by the availability of protein-ligand complex structures. Exemplified for kinase drug discovery, we address this issue by generating kinase-ligand complex data using template docking for the kinase compound subset of available ChEMBL assay data. To evaluate the benefit of the created complex data, we use it to train a structure-based E(3)-invariant graph neural network (GNN). Our evaluation shows that binding affinities can be predicted with significantly higher precision by models that take synthetic binding poses into account compared to ligand or DTI models only.
Chemistry
What problem does this paper attempt to address?
The problem discussed in this paper is the challenges of structure-based machine learning in kinase drug discovery, especially due to the lack of large-scale protein-ligand (PL) complex data. To address this issue, the researchers proposed a data generation method called guided docking, which generates kinase-ligand complex data through template docking for training structure-dependent E(3)-invariant graph neural network (GNN). They evaluated the effectiveness of this method in predicting binding affinity, and the results showed that models considering the conformational binding pose performed more accurately than models based solely on ligand or drug target interactions. The paper also discussed data partitioning strategies, the importance of structural information, and the impact on the generalization ability of deep learning models. Through this approach, they aim to overcome the problem of insufficient large-scale complex data by utilizing guided docking, in order to facilitate molecular machine learning tasks in the drug discovery process.