Effective Integration of KAN for Keyword Spotting

Anfeng Xu,Biqiao Zhang,Shuyu Kong,Yiteng Huang,Zhaojun Yang,Sangeeta Srivastava,Ming Sun
2024-09-13
Abstract:Keyword spotting (KWS) is an important speech processing component for smart devices with voice assistance capability. In this paper, we investigate if Kolmogorov-Arnold Networks (KAN) can be used to enhance the performance of KWS. We explore various approaches to integrate KAN for a model architecture based on 1D Convolutional Neural Networks (CNN). We find that KAN is effective at modeling high-level features in lower-dimensional spaces, resulting in improved KWS performance when integrated appropriately. The findings shed light on understanding KAN for speech processing tasks and on other modalities for future researchers.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
The main objective of this paper is to investigate whether Kolmogorov-Arnold Networks (KAN) can enhance the performance of the Keyword Spotting (KWS) task. Specifically, the authors explore the effectiveness of KAN in the KWS task by integrating KAN into a 1-dimensional Convolutional Neural Network (1D CNN) based architecture. The study finds that KAN excels in high-level feature modeling of the network and is particularly effective in low-dimensional spaces. Through experimental validation, the proposed method (i.e., adding a GKAN layer to the 1D CNN) significantly reduces the False Reject Rate (FRR) and performs more robustly under noisy conditions. Additionally, the authors analyze different model sizes and confirm that GKAN has certain advantages even in small-scale models. In summary, this work provides valuable insights for future researchers on how to effectively integrate KAN in speech processing and other fields.