Meta-Prompt: Boosting Whisper's Performance in Low-Resource Speech Recognition

Yaqi Chen,Tong Niu,Hao Zhang,Wenlin Zhang,Dan Qu
DOI: https://doi.org/10.1109/lsp.2024.3484328
2024-11-09
IEEE Signal Processing Letters
Abstract:Recent advancements in large-scale pre-trained automatic speech recognition (ASR) foundation models (e.g., Whisper) have exhibited remarkable performance in speech processing tasks. A recently emerging paradigm, prompt tuning, offers a parameter-efficient approach for fine-tuning, which has proven to be effective in enhancing the adaptation of pre-trained models to downstream tasks. In this paper, we first explore the prompting method for low-resource speech recognition based on Whisper. Although effective, it poses a challenge in the few-shot scenario due to its high sensitivity to initialization. To address this problem, we propose a novel meta-prompt for low-resource speech recognition that leverages the benefits of meta-learning for fast learning. Moreover, we further present a lightweight version of meta-prompt that omits the learning of encoder-prompt, reducing computational and storage costs. Extensive experiments on FLEURS datasets demonstrate consistent improvements across eleven target languages, showing better generalizability. Notably, meta-prompt achieves similar performance with a 20%-shot compared to prompt tuning with a 50%-shot setting, suggesting excellent few-shot learning ability.
engineering, electrical & electronic
What problem does this paper attempt to address?