Distance plus attention for binding affinity prediction

Julia Rahman,M. A. Hakim Newton,Mohammed Eunus Ali,Abdul Sattar
DOI: https://doi.org/10.1186/s13321-024-00844-x
2024-05-15
Journal of Cheminformatics
Abstract:Protein-ligand binding affinity plays a pivotal role in drug development, particularly in identifying potential ligands for target disease-related proteins. Accurate affinity predictions can significantly reduce both the time and cost involved in drug development. However, highly precise affinity prediction remains a research challenge. A key to improve affinity prediction is to capture interactions between proteins and ligands effectively. Existing deep-learning-based computational approaches use 3D grids, 4D tensors, molecular graphs, or proximity-based adjacency matrices, which are either resource-intensive or do not directly represent potential interactions. In this paper, we propose atomic-level distance features and attention mechanisms to capture better specific protein-ligand interactions based on donor-acceptor relations, hydrophobicity, and -stacking atoms. We argue that distances encompass both short-range direct and long-range indirect interaction effects while attention mechanisms capture levels of interaction effects. On the very well-known CASF-2016 dataset, our proposed method, named Distance plus Attention for Affinity Prediction (DAAP), significantly outperforms existing methods by achieving Correlation Coefficient (R) 0.909, Root Mean Squared Error (RMSE) 0.987, Mean Absolute Error (MAE) 0.745, Standard Deviation (SD) 0.988, and Concordance Index (CI) 0.876. The proposed method also shows substantial improvement, around 2% to 37%, on five other benchmark datasets. The program and data are publicly available on the website https://gitlab.com/mahnewton/daap.
chemistry, multidisciplinary,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy of protein - ligand binding affinity prediction. Specifically, the authors propose a new method - Distance plus Attention for Affinity Prediction (DAAP), aiming to improve the prediction effect of binding affinity by capturing specific interactions between proteins and ligands. ### Background and Challenges 1. **Challenges in Drug Discovery**: - The traditional drug discovery process is time - consuming and costly, usually taking 10 to 15 years, and the cost of each new drug is approximately $2.558 billion. - Computational methods can accelerate this process by identifying ligands with high binding affinity to disease - related proteins. 2. **Limitations of Existing Methods**: - Traditional machine learning and deep learning methods perform well in binding affinity prediction, but still have problems of being resource - intensive or unable to directly represent potential interactions. - Existing deep learning methods use 3D grids, 4D tensors, molecular graphs or proximity - based adjacency matrices, and these methods either have high computational resource requirements or cannot directly represent interactions. ### Proposed Method 1. **Atomic - level Distance Features**: - Distance - based features are introduced, including the distances between donor - acceptor (DA), hydrophobicity (HP) and π - stacking (πS) atoms. - These distance features can more directly capture short - range and long - range interactions. 2. **Attention Mechanism**: - The attention mechanism is used to capture the interaction effects at different levels. - The attention mechanism helps the model better weigh the importance of input features. 3. **Model Architecture**: - The distance matrix, sequence features of specific protein residues and SMILES sequences of ligands are combined as input features. - A deep learning architecture is adopted, and the prediction performance is enhanced by integrating the outputs of multiple models. ### Experimental Results 1. **Performance Evaluation**: - On the CASF - 2016 dataset, the DAAP method significantly outperforms existing methods, achieving a correlation coefficient (R) of 0.909, a root - mean - square error (RMSE) of 0.987, a mean absolute error (MAE) of 0.745, a standard deviation (SD) of 0.988, and a concordance index (CI) of 0.876. - On the other five benchmark datasets, DAAP also shows a significant improvement of 2% to 37%. 2. **Ablation Study**: - By experimenting with different distance matrix combinations, it is verified that the combination of donor - acceptor, π - stacking and hydrophobicity distance matrices performs best in performance. - Integrating the attention mechanism significantly improves the prediction performance of the model. ### Main Contributions 1. **Innovatively introduce distance - based features** to more directly capture protein - ligand interactions. 2. **Enhance the model's ability to capture complex binding patterns** by combining protein sequence features of specific residues. 3. **Further improve the prediction ability** through the deep learning architecture and attention mechanism. 4. **Adopt an integration method** to ensure the robustness and reliability of prediction by averaging the outputs of multiple models. In conclusion, this paper successfully solves the deficiencies of existing methods in protein - ligand binding affinity prediction by proposing the DAAP method, and significantly improves the accuracy and reliability of prediction.