Towards Language-universal Mandarin-English Speech Recognition with Unsupervised Label Synchronous Adaptation
Song Li,Haoneng Luo,Wenxuan Hu,Yuan Liu,Shiliang Zhang,Lin Li,Qingyang Hong
DOI: https://doi.org/10.1109/ISCSLP57327.2022.10037997
2022-01-01
Abstract:End-to-end multilingual and code-switching speech recognition are two challenging tasks that are studied separately in many previous works. In this work, we jointly study multilingual and code-switching problems and present a novel unsupervised label synchronous adaptation algorithm for Mandarin-English speech recognition. Specifically, we use two parallel encoders to decompose the Mel-spectrum of speech into semantic information and other acoustic attributes, such as speaker identity, accents, pronunciation characteristics of different languages, etc. During the autoregressive decoding process of the speech recognition system, an adaptive decoder is used in parallel with the speech recognition decoder to generate an adaptive embedding for each character, so that the speech recognition model can be adaptive for Mandarin, English, and code-switching cases. Our experiments show that our proposed algorithm obtains 13.5% relative error reduction over a strong baseline in the code-switching case, and outperforms both the state-of-the-art Mandarin and English monolingual models.