UY/CH-CHILD -- A Public Chinese L2 Speech Database of Uyghur Children

Mewlude Nijat,Chen,Dong Wang,Askar Hamdulla
DOI: https://doi.org/10.21437/interspeech.2024-135
2024-01-01
Abstract:Exploring the progression of pronunciation skills in second language (L2) acquisition among children presents an intriguing research avenue. Yet, the comprehension of this process for Uyghur children learning Chinese as their L2 has been constrained by a scarcity of speech data. To bridge this gap, we have developed the UY/CH-CHILD speech database, comprising 29,061 samples of Chinese words articulated by 106 Uyghur children from both kindergartens and primary schools. The database includes carefully labelled syllables and tones by native Chinese speakers. To showcase the utility of this novel resource, we conducted a comparative analysis of pronunciation errors between the kindergarten and primary school groups, unveiling interesting insights into the evolution of pronunciation proficiency in Uyghur children as they mature. The database can be downloaded online at http://child.cslt.org.
What problem does this paper attempt to address?