Structure-based, deep-learning models for protein-ligand binding affinity prediction

Debby D. Wang,Wenhui Wu,Ran Wang
DOI: https://doi.org/10.1186/s13321-023-00795-9
2024-01-06
Journal of Cheminformatics
Abstract:The launch of AlphaFold series has brought deep-learning techniques into the molecular structural science. As another crucial problem, structure-based prediction of protein-ligand binding affinity urgently calls for advanced computational techniques. Is deep learning ready to decode this problem? Here we review mainstream structure-based, deep-learning approaches for this problem, focusing on molecular representations, learning architectures and model interpretability. A model taxonomy has been generated. To compensate for the lack of valid comparisons among those models, we realized and evaluated representatives from a uniform basis, with the advantages and shortcomings discussed. This review will potentially benefit structure-based drug discovery and related areas.
chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore the current state and challenges of applying deep learning techniques in structure-based protein-ligand binding affinity prediction (PLBAP). Specifically, the paper focuses on the following aspects: 1. **Molecular Representation**: How to effectively represent the structural information of proteins and ligands for deep learning models to process. 2. **Learning Architectures**: The learning architectures adopted by current mainstream deep learning models in addressing PLBAP problems and their pros and cons. 3. **Model Interpretability**: How to improve the interpretability of deep learning models to better understand the prediction mechanisms. ### Background The interaction between proteins and ligands is one of the key issues in drug discovery research. Predicting binding affinity is crucial for identifying potential drug candidates. Although traditional molecular docking methods can quickly generate binding poses close to experimental structures, they perform poorly in further tasks such as distinguishing binders from non-binders and ranking ligands. Therefore, developing more effective binding affinity prediction methods is of great significance. ### Research Focus 1. **Convolutional Neural Network Based on Atomic Coordinates and Types (TACNN)**: - Utilizes atomic coordinates and types as inputs to predict binding affinity through atomic type convolution and radial pooling operations. - The model has hierarchical interpretability, from atomic pair interactions to molecular-level energy accumulation, and then to the overall thermodynamic cycle. 2. **Convolutional Neural Network Based on Intermolecular Contacts (TIMC-CNN)**: - Represents protein-ligand interactions as intermolecular contacts and learns these features through 2D-CNN. - Partial interpretability can be achieved by measuring the importance of features in affinity prediction. 3. **Convolutional Neural Network Based on Molecular Grids (TGrid-CNN)**: - Uses molecular grids to represent protein-ligand complexes and learns these grids through 3D-CNN. - Provides some visualization strategies to assess prediction-level interpretability, such as generating heatmaps through masking operations. 4. **Graph Convolutional Network Based on Molecular Graphs (TGraph-GCN)**: - Represents protein-ligand complexes as graphs and learns the features of nodes and edges through graph convolutional networks. - The model can be interpreted at both the model level and the prediction level by measuring the importance of features to understand the prediction mechanism. ### Evaluation To comprehensively evaluate these four types of models (TACNN, TIMC-CNN, TGrid-CNN, and TGraph-GCN), the authors constructed representative models using unified training data and attribute generation rules. The evaluation data includes the PDBbind Refined Set for model training, the Core Set for hyperparameter tuning, and two test sets from the CSAR-HiQ dataset. ### Conclusion By reviewing and evaluating mainstream structure-based deep learning PLBAP models, this paper provides valuable references for research in structured drug discovery and related fields. The paper not only discusses the advantages and disadvantages of various models but also explores how to improve model interpretability and screening performance.