Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment

Shuju Shi,Kaiqi Fu,Yiwei Gu,Xiaohai Tian,Shaojun Gao,Wei Li,Zejun Ma
DOI: https://doi.org/10.21437/interspeech.2023-380
2023-01-01
Abstract:This study explores the impact of using non-native speech data in acoustic model training for pronunciation assessment systems. The goal is to determine how introducing non-native data in acoustic model training can influence alignment accuracy and assessment performance. Acoustic models are trained using different combinations of native and non-native speech data, and the Goodness of Pronunciation (GOP) metric is used to evaluate performance. Results show that models trained with manually labeled non-native data yield the highest assessment performance and alignment accuracy. Models trained with mixed non-native and native data perform best when considering the GOP distribution on both non-native and native speech. Additionally, models trained with native data are more robust to alignment variations. These findings highlight the importance of carefully selecting and incorporating non-native data in acoustic model training for pronunciation assessment systems.
What problem does this paper attempt to address?