Abstract:Abstract It is getting increasingly challenging to efficiently exploit drug-related information described in the growing amount of scientific literature. Indeed, for drug–gene/protein interactions, the challenge is even bigger, considering the scattered information sources and types of interactions. However, their systematic, large-scale exploitation is key for developing tools, impacting knowledge fields as diverse as drug design or metabolic pathway research. Previous efforts in the extraction of drug–gene/protein interactions from the literature did not address these scalability and granularity issues. To tackle them, we have organized the DrugProt track at BioCreative VII. In the context of the track, we have released the DrugProt Gold Standard corpus, a collection of 5000 PubMed abstracts, manually annotated with granular drug–gene/protein interactions. We have proposed a novel large-scale track to evaluate the capacity of natural language processing systems to scale to the range of millions of documents, and generate with their predictions a silver standard knowledge graph of 53 993 602 nodes and 19 367 406 edges. Its use exceeds the shared task and points toward pharmacological and biological applications such as drug discovery or continuous database curation. Finally, we have created a persistent evaluation scenario on CodaLab to continuously evaluate new relation extraction systems that may arise. Thirty teams from four continents, which involved 110 people, sent 107 submission runs for the Main DrugProt track, and nine teams submitted 21 runs for the Large Scale DrugProt track. Most participants implemented deep learning approaches based on pretrained transformer-like language models (LMs) such as BERT or BioBERT, reaching precision and recall values as high as 0.9167 and 0.9542 for some relation types. Finally, some initial explorations of the applicability of the knowledge graph have shown its potential to explore the chemical–protein relations described in the literature, or chemical compound–enzyme interactions. Database URL: https://doi.org/10.5281/zenodo.4955410

R-BERT-CNN: Drug-target interactions extraction from biomedical literature

Using BERT to identify drug-target interactions from whole PubMed

Text Mining Drug/Chemical-Protein Interactions using an Ensemble of BERT and T5 Based Models

CU-UD: text-mining drug and chemical-protein interactions with ensembles of BERT-based models

BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction

Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model

Advancing Drug-Target Interaction prediction with BERT and subsequence embedding

Extracting Drug-Drug Interactions from Biomedical Texts Using BioBERT with Improved Focal Loss

An Ensemble Learning-Based Method for Inferring Drug-Target Interactions Combining Protein Sequences and Drug Fingerprints

Discovering drug–target interaction knowledge from biomedical literature

BN-DTI: A Deep Learning Based Sequence Feature Incorporating Method for Predicting Drug-Target Interaction

Extracting Drug-Drug Interactions from Texts with BioBERT and Multiple Entity-Aware Attentions.

Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical–protein relations

Attention-based approach to predict drug-target interactions across seven target superfamilies

Predicting drug-target interactions from drug structure and protein sequence using novel convolutional neural networks

A Convolutional Neural Network System to Discriminate Drug-Target Interactions

Dti-Rcnn: New Efficient Hybrid Neural Network Model To Predict Drug-Target Interactions

Using Novel Convolutional Neural Networks Architecture To Predict Drug-Target Interactions

FRnet-DTI: Deep Convolutional Neural Networks with Evolutionary and Structural Features for Drug-Target Interaction

BCM-DTI: A fragment-oriented method for drug-target interaction prediction using deep learning

Predicting Drug-Target Interactions with Deep-Embedding Learning of Graphs and Sequences.