Gene Name Automatic Recognition in Biomedical Literature

Zhihao Yang,Hongfei Lin,Jing Zhao
DOI: https://doi.org/10.1109/wcica.2006.1713819
2006-01-01
Abstract:Identifying gene names in biomedical texts is regarded as a crucial step for text mining. Our approach is a combination of dictionary based approach and machine learning based approach. Based on a gene name dictionary, an edit distance approximate string searching algorithm was used to improve the recall rate of gene recognition which is greatly lowered due to a lack of standard gene-naming conventions. Then the naive Bayes and SVM classifiers were adopted to filter out false recognitions, therefore improving the precision rate of gene recognition. The experiments show that classifiers greatly improve precision with slight loss of recall, resulting in a much better F-score (from 53.7% to 67.6%)
What problem does this paper attempt to address?