Named entity recognition in Chinese electronic medical records based on multi-feature integration

Nan YU,Pu WANG,Zhuang WENG,Liying FANG
DOI: https://doi.org/10.3969/j.issn.1002-3208.2018.03.010
2018-01-01
Abstract:Objective For the unstructured components ( medical diagnosis and patients' condition) of a tertiary hospital electronic medical records,we establish the conditional random field model with multi-feature integration,automatically identify diseases and symptoms in electronic medical record( EMR) which is described by natural language,in order to realize the structured storage of EMR,and it is beneficial for EMR information mining and statistical analysis. Methods The manually labeled corpus was divided into training set and testing set,we used NLPIR to segment the text and chose CRF++ tool for experiments. According to the data characteristics of Chinese EMR,we selected basic features and templates,determined the size of context window by contrast experiments. Then we added guide word pattern and word formation pattern,compared the effects of two advanced features on experimental result. Results When we only chose basic features, the context window was 7, the recognition performance was better;then we added advanced features, the F-measures in disease entities reached 92. 80%, the F-measures in symptom entities reached 94. 17% . Conclusions Conditional random field model with multi-feature integration can achieve high recognition performance for disease entities and symptom entities in EMR. The study is of great significance to the named entity recognition in EMR.
What problem does this paper attempt to address?