A Nested Named Entity Recognition Method for Traditional Chinese Medicine Records

Haifeng Xu,Honglan Liu,Qi Jia,Yuxiao Zhan,Yan Zhang,Yonghong Xie
DOI: https://doi.org/10.1007/978-3-030-78615-1_43
2021-01-01
Abstract:Recently, with the development of deep neural networks, the named entity recognition (NER) task has been well studied in many domains. Among these domains, the information extraction and structuring of Traditional Chinese medicine (TCM) literature is a popular application of NER approach. TCM records are the summary of TCM knowledge and experience, but there are some obstacles imposed by using common machine learning methods, TCM corpus contains a large number of nested entities. And due to entity boundary problems caused by the difference between words and characters in TCM corpus, many methods that have great performance on the English datasets are not suitable for the NER task in TCM field. In order to solve such problems, we propose a nested NER model for TCM records. First, we use word-character-level embedding to enable the model to achieve more accurate extraction results on TCM records corpus. Then, referring to the main entity categories that need to be recognized from the TCM records, we designed a two-layer labelling strategy. This allows our nested NER model to extract more fine-grained results, and has better support for follow-up work such as knowledge base construction. Finally, we conduct some experiments to verify that our model can effectively achieve the NER task for TCM records.
What problem does this paper attempt to address?