Named entity recognition (NER) for Chinese agricultural diseases and pests based on discourse topic and attention mechanism
Chao Wang,Jiale Gao,Haidi Rao,Aiwen Chen,Jin He,Jun Jiao,Nengfeng Zou,Lichuan Gu
DOI: https://doi.org/10.1007/s12065-022-00727-w
2022-05-29
Evolutionary Intelligence
Abstract:The named entities of agricultural diseases and pests are featured by complex word-formation and universal phenomena of word combination and entity embedding; in particular, in the domain of Chinese agricultural diseases and pests, there exist a lot of problems including various entity naming modes, fuzzy entity boundary, inadequate feature extraction and inconsistent labeling of entity boundary. To address the above problems, this article combined discourse topic and attention mechanism and proposed the Attention-based SoftLexicon with TF-IDF (ASLT) for agricultural diseases and pests entity recognition. By dividing the words sets based on the positions of characters in the words, merging the discourse topic features into the calculation of lexical information, and introducing the attention mechanism, the recognition accuracy of Chinese agricultural diseases and pests entities can be enhanced. In order to improve the interpretability of the model, we designed a flow chart to explain the major principles and steps, and explained the model through visual methods. This article selected 1061 Chinese agricultural news texts and constructed the Corpus of Chinese Named Entities of Diseases and Pests (CCNEDP), in which 7806 agricultural diseases and pests named entities in total were labeled. According to the present experimental results, the proposed ASLT method can effectively recognize the entities in Chinese agricultural texts and achieve favorable recognition on CCNEDP, with the recognition accuracy, the recall rate and the value of F1 of 93.57, 92.79 and 93.18%, respectively. By contrast with the other entity recognition methods, ASLT shows enhanced recognition performance in terms of accuracy and operating efficiency. The implementation of this work is publicly available at https://github.com/azureskymoon/Lexicon-TFIDF-DTopic-master/tree/master.