MocFormer: A Two-Stage Pre-training-Driven Transformer for Drug–Target Interactions Prediction

Yi-Lun Zhang,Wen-Tao Wang,Jia-Hui Guan,Deepak Kumar Jain,Tian-Yang Wang,Swalpa Kumar Roy
DOI: https://doi.org/10.1007/s44196-024-00561-1
IF: 2.259
2024-06-27
International Journal of Computational Intelligence Systems
Abstract:Drug–target interactions is essential for advancing pharmaceuticals. Traditional drug–target interaction studies rely on labor-intensive laboratory techniques. Still, recent advancements in computing power have elevated the importance of deep learning methods, offering faster, more precise, and cost-effective screening and prediction. Nonetheless, general deep learning methods often yield low-confidence results due to the complex nature of drugs and proteins, bias, limited labeled data, and feature extraction challenges. To address these challenges, a novel two-stage pre-trained framework is proposed for drug–target interactions prediction. In the first stage, pre-trained molecule and protein models develop a comprehensive feature representation, enhancing the framework's ability to handle drug and protein diversity. This also reduces bias, improving prediction accuracy. In the second stage, a transformer with bilinear pooling and a fully connected layer enables predictions based on feature vectors. Comprehensive experiments were conducted using public datasets from DrugBank and Epigenetic-regulators datasets to evaluate the framework's effectiveness. The results demonstrate that the proposed framework outperforms the state-of-the-art methods regarding accuracy, area under the receiver operating characteristic curve, recall, and area under the precision-recall curve. The code is available at: https://github.com/DHCGroup/MocFormer.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?
This paper introduces a new method called MocFormer for predicting drug-target interactions (DTIs), which is a crucial step in the drug discovery process. Traditional DTI studies rely on time-consuming laboratory techniques, but with the development of computational power, deep learning methods have become increasingly important, offering faster, more accurate, and cost-effective screening and prediction. However, existing deep learning methods often yield low confidence in their predictions due to the complexity of drugs and proteins, data biases, limited annotated data, and challenges in feature extraction. To address these issues, MocFormer proposes a two-stage pretraining-driven Transformer framework. In the first stage, pretrained models for molecules and proteins encode the sequences of drugs and proteins, generating comprehensive feature representations to reduce biases and improve prediction accuracy. In the second stage, a Transformer based on feature vectors utilizes bilinear pooling and fully connected layers for prediction. Experiments conducted on the public datasets DrugBank and Epigenetic-regulators demonstrate that MocFormer outperforms state-of-the-art methods in terms of accuracy, area under the receiver operating characteristic curve (AUC), recall, and area under the precision-recall curve (AUPRC). The code has been made publicly available on GitHub. In summary, the main contributions of the paper are: 1. The first proposal of the pretraining-driven Transformer framework, MocFormer, for DTI prediction using transfer learning. 2. By pretraining and fine-tuning, the first stage obtains comprehensive feature vectors for molecules and proteins, while the second stage uses Transformer to enhance the understanding of DTI relationships and improve prediction accuracy. 3. Experimental results show that MocFormer performs better than the latest methods on two public datasets and has the potential for clinical practice.