Zero-shot Mispronunciation Detection by Knowledge-based Data Augmentation

Zhenghai You,Mewlude Nijat,Ying Shi,Chen,Wenqiang Du,Askar Hamdulla,Dong Wang
DOI: https://doi.org/10.1109/o-cocosda60357.2023.10482946
2023-01-01
Abstract:We propose a zero-shot mispronunciation detection approach that does not require any non-native data for model training. Central to our method is a knowledge-based data augmentation process. This process synthesizes mispronunciations by taking into account the typical error patterns of the target group, subsequently using this synthesized data to train an SVM as the detection model. To validate our approach, we constructed a new L2 speech dataset named UY/CH-CHILD, which comprises L2 Chinese speech samples from Uyghur children. Experimental findings suggest that our knowledge-based augmentation strategy proficiently identifies pronunciation mistakes made by nonnative children. Interestingly, with such a zero-shot learning, the performance of the detection system is on par with that of native human annotators. The dataset and code will be available online.
What problem does this paper attempt to address?