Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books.

Bingyan Song,Zhenshan Bao,YueZhang Wang,Wenbo Zhang,Chao Sun
DOI: https://doi.org/10.1007/978-3-030-60457-8_39
2020-01-01
Abstract:Little research has been done on the Named Entity Recognition (NER) of Traditional Chinese Medicine (TCM) books and most of them use statistical models such as Conditional Random Fields (CRFs). However, in these methods, lexicon information and large-scale of unlabeled corpus data are not fully exploited. In order to improve the performance of NER for TCM books, we propose a method which is based on biLSTM-CRF model and can incorporate lexicon information into representation layer to enrich its semantic information. We compared our approach with several previous character-based and word-based methods. Experiments on “Shanghan Lun” dataset show that our method outperforms previous models. In addition, we collected 376 TCM books to construct a large-scale of corpus to obtain the pre-trained vectors since there is no large available corpus in this field before. We have released the corpus and pre-trained vectors to the public.
What problem does this paper attempt to address?