Decoding the Dancing of the Tongue: A Model-Based Learning Approach to Phonetic Targets in Coarticulation

Jianguo Wei,Guochen Bai,Wenhuan Lu,Jianwu Dang
DOI: https://doi.org/10.1121/10.0032362
2024-01-01
Abstract:A model synthesizing average frequency components from select sentences in an electromagnetic articulography database has been crafted. This revealed the dual roles of the tongue: its dorsum acts like a carrier wave, and the tip acts as a modulation signal within the articulatory realm. This model illuminates anticipatory coarticulation's subtleties during speech planning. It undergoes rigorous, two-stage optimization: statistical estimation and refinement to depict carryover and anticipation. The model's base, rooted in physiological insights, deciphers carryover targets while its upper layer captures anticipation. Optimization has pinpointed unique phonetic targets for each phoneme, providing deep insights into virtual target formation during speech planning. These simulations, aligning closely with empirical data and marked by a mere 0.18 cm average error, along with extensive listening tests attest to the model's accuracy and enhanced speech synthesis quality.
What problem does this paper attempt to address?