Enhanced Identifying Gene Names from Biomedical Literature with Conditional Random Fields

Wei-Zhong Qian,Chong Fu,Hong-Rong Cheng,Qiao Liu,and Zhi-Guang Qin are with School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu,61005,China
2009-01-01
Journal of Electronic Science and Technology
Abstract:Identifying gene names is an attractive research area of biology computing. However, accurate extraction of gene names is a challenging task with the lack of conventions for describing gene names. We devise a systematical architecture and apply the model using conditional random fields (CRFs) for extracting gene names from Medline. In order to improve the performance, biomedical ontology features are inserted into the model and post processing including boundary adjusting and word filter is presented to solve name overlapping problem and remove false positive single words. Pure string match method, baseline CRFs, and CRFs with our methods are applied to human gene names and HIV gene names extraction respectively in 1100 abstracts of Medline and their performances are contrasted. Results show that CRFs are robust for unseen gene names. Furthermore, CRFs with our methods outperforms other methods with precision 0.818 and recall 0.812.
What problem does this paper attempt to address?