End-To-End Accent Conversion Without Using Native Utterances

Songxiang Liu,Disong Wang,Yuewen Cao,Lifa Sun,Xixin Wu,Shiyin Kang,Zhiyong Wu,Xunying Liu,Dan Su,Dong Yu,Helen Meng
DOI: https://doi.org/10.1109/icassp40776.2020.9053797
2020-01-01
Abstract:Techniques for accent conversion (AC) aim to convert non-native to native accented speech. Conventional AC methods try to convert only the speaker identity of a native speaker's voice to that of the non-native accented target speaker, leaving the underlying content and pronunciations unchanged. This hinders their practical use in real-world applications, because native-accented utterances are required at conversion stage. In this paper, we present an end-to-end framework, which is able to conduct AC from non-native-accented utterances without using any native-accented utterances during online conversion. We achieve this by independently extracting linguistic and speaker representations from non-native accented speech and condition a speech synthesis model on these representations to generate native-accented speech. Experiments on open-source data corpora show that the proposed system can convert Hindi-accented English speech into native American English speech with high naturalness, which is indistinguishable from native-accented recordings in terms of accent.
What problem does this paper attempt to address?