MTNER: A Corpus for Mongolian Tourism Named Entity Recognition

Xiao Cheng,Weihua Wang,Feilong Bao,Guanglai Gao
DOI: https://doi.org/10.1007/978-981-33-6162-1_2
2020-01-01
Abstract:Name Entity Recognition is the essential tool for machine translation. Traditional Named Entity Recognition focuses on the person, location and organization names. However, there is still a lack of data to identify travel-related named entities, especially in Mongolian. In this paper, we introduce a newly corpus for Mongolian Tourism Named Entity Recognition (MTNER), consisting of 16,000 sentences annotated with 18 entity types. We trained in-domain BERT representations with the 10 GB of unannotated Mongolian corpus, and trained a NER model based on the BERT tagging model with the newly corpus. Which achieves an overall 82.09 F1 score on Mongolian Tourism Named Entity Recognition and lead to an absolute increase of +3.54 F1 score over the traditional CRF Named Entity Recognition method.
What problem does this paper attempt to address?