A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese.
Changwei Liang,Jiangping Kong,Xiyu Wu
DOI: https://doi.org/10.1145/3448748.3448796
2021-01-01
Abstract:In this paper, a new speech driven 3-D geometric tongue model is constructed. The constructed 3-D tongue shape is controlled with control points on 2-D midsagittal tongue curve, and speech-driven inverse estimation based on the constructed model is evaluated by empirical data. X-Ray 2-D vocal tract motion videos are tagged for the midsagittal tongue motion, and static 3-D vocal tracts of 20 phonemes are collected with MRI for the realistic 3-D tongue shape. MFCC are calculated from the videos as acoustic features, and are then used in a LSTM-RNN to predict the control points movement of the tongue shape. Three geometrically intuitive control points are selected to represent and calculate the midsagittal line of the tongue through linear regression. Cross-sections on the central lines of the tongues, whose height, width and angle are then predicted from the midsagittal line, are reconstructed with geometric curves, and the shape of each cross-section are then placed on the midsagittal line to get the overall predicted moving grid of the 3-D tongue. In this 3-D tongue model, acoustic features and realistic tongue motion are mapped directly to preserve more realistic articulatory details, and the control points are intuitive for non-experts to control the model, and the geometric tongue shapes predicted are comparable with realistic tongue dynamics. Based on the proposed method, the speech-driven prediction is evaluated with the realistic data, which proved this proposed method feasible.