Noise-Robust Keyword Spotting through Self-supervised Pretraining

Jacob Mørk,Holger Severin Bovbjerg,Gergely Kiss,Zheng-Hua Tan
2024-03-27
Abstract:Voice assistants are now widely available, and to activate them a keyword spotting (KWS) algorithm is used. Modern KWS systems are mainly trained using supervised learning methods and require a large amount of labelled data to achieve a good performance. Leveraging unlabelled data through self-supervised learning (SSL) has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining such as Data2Vec can be used to enhance the robustness of KWS models in noisy conditions, which is under-explored.
Audio and Speech Processing,Machine Learning,Sound
What problem does this paper attempt to address?