Boosting Character-based Mandarin ASR via Chinese Pinyin Representation
Li Li,Yanhua Long,Dongxing Xu,Yijie Li
DOI: https://doi.org/10.1007/s10772-023-10050-z
2023-11-09
International Journal of Speech Technology
Abstract:Current end-to-end automatic speech recognition (ASR) models have achieved good results in phonetic language such as English and French. However, Chinese character is a typical ideographic writing, and there is no direct correspondence between Chinese characters and phonetics, but Pinyin, as a mark of the pronunciation of Chinese characters, has an internal connection with these characters. Therefore, it's crucial to introduce Pinyin into Mandarin ASR to assist the target units modeling of end-to-end speech recognition systems. In this work, we propose a method to boost character-based Conformer-Transducer Mandarin ASR system via Chinese Pinyin representation. Specifically, four new frameworks are investigated to enhance the target units modeling ability of end-to-end Mandarin ASR, including a PCM, PCM with PTLE or PALE and a TAE, they integrate the Pinyin and character information with different implementation in a CTC/transducer multi-task training framework. Experiments on both Aishell and accented ASR tasks show that, the proposed method significantly outperforms the conventional character-based model, it reduces the character error rate of Aishell test set and three accented ASR test sets by relative 6.4%, 5.6%, 4.2% and 4.5%, respectively.