GEFormerDTA: drug target affinity prediction based on transformer graph for early fusion

Youzhi Liu,Linlin Xing,Longbo Zhang,Hongzhen Cai,Maozu Guo
DOI: https://doi.org/10.1038/s41598-024-57879-1
IF: 4.6
2024-03-30
Scientific Reports
Abstract:Predicting the interaction affinity between drugs and target proteins is crucial for rapid and accurate drug discovery and repositioning. Therefore, more accurate prediction of DTA has become a key area of research in the field of drug discovery and drug repositioning. However, traditional experimental methods have disadvantages such as long operation cycles, high manpower requirements, and high economic costs, making it difficult to predict specific interactions between drugs and target proteins quickly and accurately. Some methods mainly use the SMILES sequence of drugs and the primary structure of proteins as inputs, ignoring the graph information such as bond encoding, degree centrality encoding, spatial encoding of drug molecule graphs, and the structural information of proteins such as secondary structure and accessible surface area. Moreover, previous methods were based on protein sequences to learn feature representations, neglecting the completeness of information. To address the completeness of drug and protein structure information, we propose a Transformer graph-based early fusion research approach for drug-target affinity prediction (GEFormerDTA). Our method reduces prediction errors caused by insufficient feature learning. Experimental results on Davis and KIBA datasets showed a better prediction of drugtarget affinity than existing affinity prediction methods.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to predict the affinity (DTA) between drugs and target proteins more quickly and accurately in the fields of drug discovery and re - positioning. Traditional experimental methods have disadvantages such as long operation cycles, high labor requirements, and high economic costs, and it is difficult to quickly and accurately predict specific drug - protein interactions. In addition, existing computational methods mainly use the SMILES sequence of drugs and the primary structure of proteins as inputs, ignoring the graph information in drug molecular graphs (such as bond coding, degree - centrality coding, spatial coding) and the structural information of proteins (such as secondary structure and accessible surface area). Therefore, in order to improve the accuracy of drug - protein affinity prediction and reduce the prediction error caused by insufficient feature learning, this paper proposes an early - fusion research method based on Transformer graphs (GEFormerDTA), aiming to comprehensively utilize the structural information of drugs and proteins and improve the accuracy of DTA prediction. Specifically, this research solves the above problems through the following points: 1. **Comprehensively utilize the structural information of drugs and proteins**: In addition to the traditional drug SMILES sequence and protein primary structure, multiple feature representations of drug molecular graphs (node, degree - centrality, spatial, and edge - coding features) as well as the secondary structure information and accessible surface area information of proteins are introduced. 2. **Early - fusion mechanism**: The binding affinity between drugs and proteins is processed through the early - fusion mechanism to reduce the prediction error caused by information redundancy. 3. **Improved attention mechanism**: The Sparsepro self - attention mechanism is introduced to extract important query matrices, reduce model complexity, and further optimize feature representations through GCN distillation operations. These improvements make the experimental results of GEFormerDTA on the Davis and KIBA datasets better than those of existing DTA prediction methods, thus providing a more effective method for drug discovery and re - positioning.