Multimodal Automatic Speech Fluency Evaluation Method for Putonghua Proficiency Test Propositional Speaking Section

Jiajun Liu,Huazhen Meng,Yunfei Shen,Linna Zheng,Aishan Wumaier
DOI: https://doi.org/10.1109/iscslp57327.2022.10037908
2022-01-01
International Journal of Asian Language Processing
Abstract:The Putonghua Proficiency Test (Putonghua Shuiping Ceshi, PSC) is a valid form of speaking test in China. The propositional speaking section in PSC focuses on the speakers’ ability to express ideas fluently and accurately without textual reference. However, unlike the other sections of the PSC, the propositional speaking section is still scored manually. Aiming at the problem of inefficiency, high cost, and subjectivity of manual scoring in the propositional speaking section, a multimodal method is proposed based on textual and acoustic modalities for automatic speech fluency evaluation. First, different neural networks are used to extract unimodal features. Then, cross-modal attention is applied to achieve multimodal fusion. Finally, fluency evaluation results are obtained by applying self-attention to reinforce the information with high contribution. The accuracy of the proposed method for automatic speech fluency evaluation is 81.67% on the self-built dataset. It shows that the textual and acoustic features used in this paper provide complementary information to improve the accuracy of fluency evaluation. And the fused features can be effectively applied to automatic speech fluency evaluation tasks.
What problem does this paper attempt to address?