Automatic Labeling of Tibetan Prosodic Boundary Based on Speech Synthesis Tasks.
Zom Yang,Kuntharrgyal Khysru,Yi Zhu,Long Daijicuo,Jianguo Wei
DOI: https://doi.org/10.1145/3611380.3628558
2023-01-01
Abstract:Prosodic is the highest expression of speech dynamics, which is mainly reflected in the pause, tone intensity, accent, and rhythm during natural pronunciation. Prosodic labeling is an important factor in improving the naturalness of speech and enhancing semantic understanding. The extraction of prosodic information can make the effect of speech synthesis closer to nature. In this paper, from the theory of Tibetan grammar and the characteristics of Tibetan speech, we design a method for automatic labeling of Prosodic boundaries that includes Tibetan text, acoustic features, and other Tibetan speech characteristics of Tibetan speech around the task of Tibetan speech synthesis. We choose 20975 Tibetan speech synthesis corpus to validate the designed automatic labeling method. The F1 values of Prosodic words, Prosodic phrases, and intonation phrases obtained by using the Prosodic labeling rule are 95%, 93.4%, and 90.4%, respectively, which provide certain feasibility and scientificity for mining the regular features of the pronunciation of the Tibetan language in the task of Tibetan speech synthesis.