Combined X-ray and facial videos for phoneme-level articulator dynamics

Hui Chen,Lan Wang,Wenxi Liu,Pheng-Ann Heng
DOI: https://doi.org/10.1007/s00371-010-0434-1
IF: 2.835
2010-01-01
The Visual Computer
Abstract:Dynamic external and internal articulator motions are integrated into a low-cost data-driven three-dimensional talking head in this paper. External and internal articulations are defined and calibrated from the video streams and the videofluoroscopy to a generic 3D talking head model. Three different deformation modes in relation to pronunciation characteristics of muscular soft tissue of lips and tongue, up-down movements of chin and the relatively fixed articulators are set up and integrated. The shape blending functions among segmented phonemes of natural speech input are synthesized in an utterance. Animations of the confusable phonemes and minimal pairs are shown to English teachers and learners for a perception test. The results show that the proposed method can reflect the real situation of phonetic pronunciation realistically.
What problem does this paper attempt to address?