Free Linguistic and Speech Resources for Tibetan

Guanyu Li,Hongzhi Yu,Thomas Fang Zheng,Jinghao Yan,Shipeng Xu
DOI: https://doi.org/10.1109/apsipa.2017.8282130
2017-01-01
Abstract:Tibetan is an important low-resource language in China. A key factor that hinders the speech and language research for Tibetan is the lack of resources, particularly free ones. This paper describes our recent progression on Tibetan resource construction supported by the NSFC M2ASR project, including the phone set, lexicon, as well as the transcription of a large scale speech corpus. Following the M2ASR free data program, all the resources are publicly available and free for researchers. We also release a small Tibetan speech database that can be used to build a proto type Tibetan speech recognition system.
What problem does this paper attempt to address?