Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model

Jehad Aldahdooh,Ziaurrehman Tanoli,Jing Tang
DOI: https://doi.org/10.1101/2023.07.24.550359
2024-03-05
Abstract:Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature. In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pretrained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an F1 score of on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks. Datasets utilized in this study are accessible at .
Bioinformatics
What problem does this paper attempt to address?