Named Entity Recognition of TCM Classics Based on SiKuBERT and Multi-Feature Embedding

Ziwei Wu,Yuxuan Liu,Qingao Huo,Wendong Zhang
DOI: https://doi.org/10.1109/ICCECE58074.2023.10135391
2023-01-01
Abstract:NER of traditional Chinese medical (TCM) classics is the basis for constructing knowledge graphs of Chinese medicine. However, the current research in this field is insufficient, and there are problems due to the corpus is sparse and the rich semantic information in the special structural features of ancient Chinese characters is not considered. Therefore, the Herb dataset is constructed. Meanwhile, we propose a model based on SiKuBERT and multi-feature embedding by combining the characteristics of the corpus and considering the special structural information of ancient Chinese characters for the first time in the filed of TCM classics named entity recognition. The results demonstrate that our model can effectively identify five distinct types of entities with an F1-score of 86.66%, a precision rate of 86.95%, and a recall rate of 86.37%, which outperforms other popular deep learning models.
What problem does this paper attempt to address?