A Few-Shot Speech Keyword Spotting Method Based on Self-Supervise Learning.

Mingdong Yu,Xiaofeng Jin,Bangxian Wan,Guirong Wang
DOI: https://doi.org/10.1109/CISP-BMEI60920.2023.10373303
2023-01-01
Abstract:Keyword spotting (KWS) plays a crucial role in enabling voice-based user interactions on smart devices. However, conventional KWS methods require a large number of predefined keywords to achieve acceptable detection accuracy, which users may find challenging to provide. In recent years, self-supervised training and large models have excelled in various audio tasks. Their general audio feature extraction capabilities align well with the low-resource nature and scalability requirements of KWS tasks. In this paper, we integrate self-supervised models with keyword transformers to tailor them for KWS tasks. Experiments show that our approach significantly outperforms previous supervised methods. Moreover, our method’s advantages become even more pronounced under extremely limited resource conditions, which is of great importance for the rapid deployment of KWS systems.
What problem does this paper attempt to address?