MulinforCPI: enhancing precision of compound–protein interaction prediction through novel perspectives on multi-level information integration

Ngoc-Quang Nguyen,Sejeong Park,Mogan Gim,Jaewoo Kang
DOI: https://doi.org/10.1093/bib/bbad484
IF: 9.5
2024-01-06
Briefings in Bioinformatics
Abstract:Forecasting the interaction between compounds and proteins is crucial for discovering new drugs. However, previous sequence-based studies have not utilized three-dimensional (3D) information on compounds and proteins, such as atom coordinates and distance matrices, to predict binding affinity. Furthermore, numerous widely adopted computational techniques have relied on sequences of amino acid characters for protein representations. This approach may constrain the model's ability to capture meaningful biochemical features, impeding a more comprehensive understanding of the underlying proteins. Here, we propose a two-step deep learning strategy named MulinforCPI that incorporates transfer learning techniques with multi-level resolution features to overcome these limitations. Our approach leverages 3D information from both proteins and compounds and acquires a profound understanding of the atomic-level features of proteins. Besides, our research highlights the divide between first-principle and data-driven methods, offering new research prospects for compound–protein interaction tasks. We applied the proposed method to six datasets: Davis, Metz, KIBA, CASF-2016, DUD-E and BindingDB, to evaluate the effectiveness of our approach.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
This paper focuses on improving the accuracy of Compound-Protein Interaction (CPI) prediction. Existing methods often fail to fully utilize three-dimensional (3D) information, such as atomic coordinates and distance matrices, when predicting the binding affinity between drugs and proteins. The paper proposes a deep learning strategy called MulinforCPI, which combines transfer learning techniques and multi-resolution features to better understand atomic-level features using the 3D information of compounds and proteins. The study also highlights the differences between first-principles methods and data-driven methods, providing new research perspectives for the CPI task. The MulinforCPI method consists of two stages: pre-training and fine-tuning. In the pre-training stage, the 3DInfoMax strategy is adopted to enable the graph neural network to generate 3D features from compounds. The fine-tuning stage uses ESMFold to construct 3D structures from protein sequences. Through cross-attention mechanism, the model effectively combines local and high-level information from compounds, as well as atomic-level and high-level structural information from proteins, thereby improving prediction accuracy. The effectiveness of this method is evaluated on multiple standard datasets in the paper.