The Method of Medical Named Entity Recognition Based on Semantic Model and Improved SVM-KNN Algorithm

Xia Han,Rao Ruonan
DOI: https://doi.org/10.1109/skg.2011.24
2011-01-01
Abstract:In the medical field, a lot of unstructured information which is expressed by natural language exists in medical literature, technical documentation and medical records. IE (Information Extraction) as one of the most important research directions in natural language process aims to help humans extract concerned information automatically. NER (Named Entity Recognition) is one of the subsystems of IE and has direct influence on the quality of IE. Nowadays NER of medical field has not reached ideal precision largely due to the knowledge complexity in medical field. It is hard to describe the medical entity definitely in feature selection and current selected features are not rich and lack of semantic information. Besides that, different medical entities which have the similar characters more easily cause classification algorithm making wrong judgment. Combination multi classification algorithms such as SVM-KNN can overcome its own disadvantages and get higher performance. But current SVM-KNN classification algorithm may provide wrong categories results due to setting inappropriate K value and unbalanced examples distribution. In this paper, we introduce two-level modelling tool to help specialists to build semantic models and select features from them. We design and implement medical named entity recognition analysis engine based on UIMA framework and adopt improved SVM-KNN algorithm called EK-SVM-KNN (Extending K SVM-KNN) in classification. We collect experiment data from SLE(Systemic Lupus Erythematosus) clinical information system in Renji Hospital. We adopt 50 Pathology reports as training data and provide 1000 Pathology as test data. Experiment shows medical NER based on semantic model and improved SVM-KNN algorithm can enhance the quality of NER and we get the precision, recall rate and F value up to 86%.
What problem does this paper attempt to address?