Audiovisual Synthesis of Exaggerated Speech for Corrective Feedback in Computer-Assisted Pronunciation Training

Junhong Zhao,Hua Yuan,Wai-Kim Leung,Helen Meng,Jia Liu,Shanhong Xia
DOI: https://doi.org/10.1109/icassp.2013.6639267
2013-01-01
Abstract:In second language learning, unawareness of the differences between correct and incorrect pronunciations is one of the largest obstacles for mispronunciation correction. In order to make the feedback more discriminatively perceptible, this paper presents a novel method for corrective feedback generation, namely, exaggerated feedback, for language learning. To produce exaggeration effect, the neutral audio and visual speech are both exaggerated and then re-synthesized synchronously based on the audiovisual synthesis technology. The audio speech exaggeration is realized by adjusting the acoustic features related to duration, pitch and energy of the speech according to different phonemes conditions. The visual speech exaggeration is realized by increasing the range of articulatory movement and slowing down the movement around the key actions. The results show that our methods can effectively generate bimodal exaggeration effect for feedback provision and make them more distinctive to be perceived.
What problem does this paper attempt to address?