A Dataset of Discovering Drug-Target Interaction from Biomedical Literature

Yutai Hou,Yingce Xia,Lijun Wu,Shufang Xie,Yang Fan,Jinhua Zhu,Wanxiang Che,Tao Qin,Tie-Yan Liu
2021-01-01
Abstract:As millions of papers come out every year in the biomedical domain, automatic 1 knowledge discovery (KD) from biomedical literature becomes an urgent demand 2 in the industry. While KD in the biomedical domain attracts much research at3 tention in recent years, the lack of benchmark datasets significantly hinders its 4 progress. In this work, we create a dataset, KD-DTI, for discovering 〈drug, target, 5 interaction〉 triplets from literature, which is one of the most important KD tasks 6 in the biomedical domain. KD-DTI contains 14k unique biomedical papers, each 7 of which is associated with at least one 〈drug, target, interaction〉 triplet. We 8 also provide a semi-supervised dataset with 139k unique papers. We present and 9 analyze multiple solutions, including several extractive/generative models and two 10 data enhancement methods. The results show that the performance of those models 11 is far from industry demand, indicating that the dataset presents a challenging 12 research problem for the community. The dataset will be freely accessible after the 13 review process. 14
What problem does this paper attempt to address?