Recognizing Chemical Entities in Biomedical Literature using Conditional Random Fields and Structured Support Vector Machines

Buzhou Tang,Xiaolong Wang,Yonghui Wu,Min Jiang,Jingqi Wang,Hua Xu
2013-01-01
Abstract:The Spanish National Cancer Research Center (CNIO) and University of Navarra organized a challenge on recognizing chemical compounds and drugs (chemical entities) in biomedical literature, which includes two individual subtasks: 1) chemical entity mention recognition (CEM); and 2) chemical document indexing (CDI). The challenge organizers manually annotated chemical entities in 10000 abstracts from PubMed, of which 3500 abstracts were used as a training set, 3500 abstracts as a development set, and 3000 abstracts as a test set. We participated in subtask 1 and developed a machine learning-based system using two state-of-the-art sequence labeling algorithms: Conditional Random Fields (CRF) and Structured Support Vector Machines (SSVM). Our best model built on the training set achieved the highest F-measure of 0.81862 for CEM on the development set.
What problem does this paper attempt to address?