Language resource construction for Mongolian.

Shipeng Xu,Hongzhi Yu,Thomas Fang Zheng,Guanyu Li,Gegeentana
DOI: https://doi.org/10.1109/APSIPA.2017.8282132
2017-01-01
Abstract:Mongolian is a typical low-resource language. The resource limitation is in various aspects, from acoustic analysis, phonetic rules, lexicon, speech and text data. This paper describes our recent progression on Mongolian resource construction supported by the NSFC M2ASR project. Firstly, we collected the text data of Mongolian containing more than 60,000 sentences from the newspaper, internet and Mongolian books. Secondly, we built the initial dictionary of Mongolian based on the Mongolian Chinese Dictionary. All the resources are published following the M2ASR Free Data Program.
What problem does this paper attempt to address?