A Template-Driven Framework for Chinese Medical Named Entity Recognition

Yilin Song,Fang Kong,Shengjie Ji
DOI: https://doi.org/10.1007/978-981-97-5672-8_34
2024-01-01
Abstract:Chinese Electronic Medical Records (EMR) Named Entity Recognition (NER) refers to the task of identifying predefined entities in EMR, which is a crucial task in clinical text mining. Traditional sequence labeling methods face challenges in the medical field, such as data scarcity and difficulty in accurately recognizing complex medical entities. Therefore, we propose a template-based Chinese medical NER framework. Firstly, we design an auxiliary task of global type prediction to better align our downstream task with BERT's pre-training task of masked language model, fully leveraging the advantages of pre-trained language models. Secondly, we design a candidate entity classification module: using instances to enrich entity description representation and scoring each span in the text based on the entity description. Finally, we conduct experiments in both the full datasets and few-shot scenarios. Experiments on the full datasets of CCKS2017, cMeDQANER, and Resume demonstrate that our method achieves significantly better results than baselines in both the public and medical domains. These results indicate that our method achieves competitive performance in both general and medical scenarios. In addition, we partitioned the CCKS2017 dataset into three few-shot learning scenarios: 5 similar to 10-shot, 25 similar to 35-shot, and 10% low-resource setting. Our method achieved significantly better performance than the baseline method in all of these few-shot scenarios.
What problem does this paper attempt to address?