Extending Multilingual ASR to New Languages Using Supplementary Encoder and Decoder Components

Yerbolat Khassanov,Zhipeng Chen,Tianfeng Chen,Tze Yuang Chong,Wei Li,Lu,Zejun Ma
DOI: https://doi.org/10.1109/icassp48485.2024.10446800
2024-01-01
Abstract:Extending multilingual automatic speech recognition (mASR) systems to new languages poses challenges, particularly when training data for existing languages is limited or unavailable. To tackle this issue, we suggest utilizing supplementary encoder and decoder components. Specifically, we propose appending and fine-tuning a distinct decoder designed for new languages, while preserving the parameters of existing languages to minimize disruption to their performance. Furthermore, we advocate attaching an additional encoder component to enhance acoustic representation learning for new languages, resulting in substantial improvements in word error rate performance. Our experimental findings demonstrate the effectiveness of the proposed methods for the task of extending language support within mASR systems.
What problem does this paper attempt to address?