Exploiting Morpheme and Cross-lingual Knowledge to Enhance Mongolian Named Entity Recognition
Songming Zhang,Ying Zhang,Yufeng Chen,Du Wu,Jinan Xu,Jian Liu
DOI: https://doi.org/10.1145/3511098
IF: 1.471
2022-01-01
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract:Mongolian named entity recognition (NER) is not only one of the most crucial and fundamental tasks in Mongolian natural language processing, but also an important step to improve the performance of downstream tasks such as information retrieval, machine translation, and dialog system. However, traditional Mongolian NER models heavily rely on the feature engineering. Even worse, the complex morphological structure of Mongolian words makes the data sparser. To alleviate the feature engineering and data sparsity in Mongolian named entity recognition, we propose a novel NER framework with Multi-Knowledge Enhancement (MKE-NER) . Specifically, we introduce both linguistic knowledge through Mongolian morpheme representation and cross-lingual knowledge from Mongolian-Chinese parallel corpus. Furthermore, we design two methods to exploit cross-lingual knowledge sufficiently, i.e., cross-lingual representation and cross-lingual annotation projection. Experimental results demonstrate the effectiveness of our MKE-NER model, which outperforms strong baselines and achieves the best performance (94.04% F1 score) on the traditional Mongolian benchmark. Particularly, extensive experiments with different data scales highlight the superiority of our method in low-resource scenarios.