Application of Word Embeddings in Biomedical Named Entity Recognition Tasks

F. Chang,Junyu Guo,W. Xu,S. Chung
2015-01-01
Abstract:Biomedical named entity recognition (BioNER) is the fundamental task of biomedical text mining. Machine-learning-based approaches, such as conditional random fields (CRFs), have been widely applied in this area, but the accuracy of these systems is limited because of the finite annotated corpus. In this study, word embedding features are generated from an unlabeled corpus, which as extra word features are induced into the CRFs system for Bio-NER. To further improved performance, a post-processing algorithm is employed after the named entity recognition task. Experimental results show that the word embedding features generated from a larger unlabeled corpus achieves higher performance, and the use of word embedding features increases F-measure on INLPBA04 data from 71.50% to 71.77%. After applying the post-processing algorithm, the F-measure reaches 71.85%, which is superior to the results in most existing systems. Subject Categories and Descriptors I.2.7 [Artificial Intelligence]: Natural Language Processing Text Analysis; H.3.1 [Information Storage And Retrieval]: Content Analysis and Indexing Linguistic processing General Terms Algorithm, Biomedical Text Mining
What problem does this paper attempt to address?