A Speech Recognition Method Based on Transfer Learning for PSC Topic Speaking Section

Huazhen Meng,Jiajun Liu,Yunfei Shen,Zhixing Fan,Aishan Wumaier,Linna Zheng,Ruotong Yang
DOI: https://doi.org/10.1109/prml56267.2022.9882197
2022-01-01
Abstract:Many automatic speech scoring systems employ an Automatic Speech Recognition (ASR) system to get the speech’s transcription, then carry out the other scoring process. The accuracy of the ASR system output, which will be used to calculate the scoring model’s features, significantly impacts achieving high scoring accuracy. The National Putonghua Proficiency Test (NPPT) is an official proficiency test in China. The NPPT has four sections: monosyllabic-word reading, multisyllabic-word reading, short text reading, and topic speaking. At present, the topic speaking part still needs to be scored by two certificated human testers. In order to develop an automatic speech scoring system for the NPPT, a robust ASR system is needed. Although the general mandarin ASR systems could also get higher accuracy, most popular general ASR systems have significantly lower performance on the topic of speaking speeches, which contains more mistakes than the general level. This paper builds a speech recognition corpus to develop a specialized ASR for the NPPT topic speaking speech scoring task and adopts the transformer and conformer model to implement the ASR. We also implemented a transfer-learning-based ASR model based on the popular pre-trained Mandarin ASR framework WeNet. The experimental results show that our model could significantly reduce the error rate of lower proficiency speech while achieving higher performance than popular Mandarin ASR systems for general-purpose speech recognition.
What problem does this paper attempt to address?