A Multitask Training Approach to Enhance Whisper with Open-Vocabulary Keyword Spotting

Yuang Li,Min Zhang,Chang Su,Yinglu Li,Xiaosong Qiao,Mengxin Ren,Miaomiao Ma,Daimeng Wei,Shimin Tao,Hao Yang
DOI: https://doi.org/10.21437/interspeech.2024-104
2024-01-01
Abstract:The recognition of rare named entities, such as personal names andterminologies, is challenging for automatic speech recognition (ASR) systems,especially when they are not frequently observed in the training data. In thispaper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novelASR system that leverages the Whisper model and performs open-vocabularykeyword spotting (OV-KWS) on the hidden states of the Whisper encoder torecognize user-defined named entities. These entities serve as prompts for theWhisper decoder. To optimize the model, we propose a multitask trainingapproach that learns OV-KWS and contextual-ASR tasks. We evaluate our approachon Chinese Aishell hot word subsets and two internal code-switching test setsand show that it significantly improves the entity recall compared to theoriginal Whisper model. Moreover, we demonstrate that the OV-KWS can be aplug-and-play module to enhance the ASR error correction methods and frozenWhisper models.
What problem does this paper attempt to address?