Mongolian Medicine Named Entity Recognition via Dictionary-Based Synonym Generalization

Si Qin,Feilong Bao,Uuganbaatar Dulamragchaa
DOI: https://doi.org/10.1109/CCIS59572.2023.10262847
2023-01-01
Abstract:The task of named entity recognition in Mongolian medicine poses challenges due to the diversity and complexity of terminology and herbal names. Variations in names used across different literature sources and regions hinder accurate recognition, as traditional dictionary-based methods often fail to encompass all synonyms and variants. To address this issue, this paper presents a lexicon-based synonym generalization method. By leveraging lexical resources in Mongolian, this approach extends the synonyms of common herbs and terms, incorporating additional synonyms and related words into the named entity recognition system. Consequently, the system becomes capable of recognizing a wider range of names and variants. In this study, the authors construct the Mongolian Medicine Named Entity Recognition Datasets (MMNER), comprising five entity types and 44,968 entities. The datasets are developed using the proposed synonym generalization framework. Subsequently, a series of deep learning models are employed to conduct experimental comparisons on the MMNER datasets. The experimental results demonstrate that by enhancing synonym coverage, the recognition of Mongolian medicine entities in text achieves a significantly improved accuracy, with an F1 value of 93.66%. These findings signify a noteworthy enhancement in the performance of Mongolian medicine named entity recognition.
What problem does this paper attempt to address?