Design and implementation of Tibetan continuous speech corpus

LI Yong-hong,YU Hong-zhi,KONG Jiang-ping
DOI: https://doi.org/10.3778/j.issn.1002-8331.2010.13.069
2010-01-01
Abstract:By taking Tibetan Xiahe dialect as the research object,continuous speech corpus based on triphone is built.At first,text corpus with 100 thousand sentences is collected and they are transformed to IPA according to pronunciation of Xiahe dialect,and then structure of triphone juncture is summarized and combination type and frequency of triphone in Corpus are statistically analyzed with text-processing platform in detail.At last by comprehensively considering coverage rate and sparseness of triphone and class-triphone the algorithm for extraction of corpus is designed and automatic selection to corpus is realized.
What problem does this paper attempt to address?