Speech synthesis of VCV sequence using a physiological articulatory model

Jianwu Dang,Kiyoshi Honda
DOI: https://doi.org/10.1121/1.425113
1999-01-01
Abstract:A 3-D articulatory model has been male speaker. The model consists of the constructed based on volumetric MRI data for a Japanese midsagittal layer of the tongue, jaw-hyoid bone complex, and vocal tract wall that comprise the main vocal tract. This work describes a multi-point control strategy for producing vowel-consonant-vowel sequences through the generation of muscle contraction parameters based on target-reaching tasks. In this method, three control points are chosen on the mandible, tongue tip and tongue dorsum, and the corresponding target points are defined as a fixed position for each phonetic segment of the utterance. The time sequences of muscle activation signals are determined by iterating model simulation to reduce the distance between the control and target points. The muscle activation signals are obtained in a space of muscle force vectors defined for each control point, and fed to the muscles to drive the model. Generated articulatory movements of the model derive the sequence of vocal tract area functions. Examples of the synthetic sounds are demonstrated using the area functions.
What problem does this paper attempt to address?