Semi-Supervised Federated Learning for Keyword Spotting

Enmao Diao,Eric W. Tramel,Jie Ding,Tao Zhang
2023-05-09
Abstract:Keyword Spotting (KWS) is a critical aspect of audio-based applications on mobile devices and virtual assistants. Recent developments in Federated Learning (FL) have significantly expanded the ability to train machine learning models by utilizing the computational and private data resources of numerous distributed devices. However, existing FL methods typically require that devices possess accurate ground-truth labels, which can be both expensive and impractical when dealing with local audio data. In this study, we first demonstrate the effectiveness of Semi-Supervised Federated Learning (SSL) and FL for KWS. We then extend our investigation to Semi-Supervised Federated Learning (SSFL) for KWS, where devices possess completely unlabeled data, while the server has access to a small amount of labeled data. We perform numerical analyses using state-of-the-art SSL, FL, and SSFL techniques to demonstrate that the performance of KWS models can be significantly improved by leveraging the abundant unlabeled heterogeneous data available on devices.
Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The paper aims to address several key issues in the task of Keyword Spotting (KWS), especially the challenges faced when applied on mobile devices and virtual assistants. Specifically, the paper focuses on the following aspects: 1. **Utilizing Unlabeled Data**: Existing Federated Learning (FL) methods typically require precise labeled data on the device side, which is both expensive and impractical for local audio data. Therefore, the researchers propose a Semi-Supervised Federated Learning (SSFL) framework that can fully utilize the large amount of unlabeled data on the device side with only a small amount of labeled data on the server side. 2. **Addressing the Non-Independent and Identically Distributed (Non-IID) Problem**: In federated learning, the data distribution across different clients may vary significantly. This paper effectively mitigates this issue and improves model performance through alternate training techniques, combining Semi-Supervised Learning (SSL) and federated learning. 3. **Application of Data Augmentation Techniques**: To better utilize unlabeled data, the researchers explored various data augmentation methods, including basic augmentation, SpecAugment, RandAugment, and MixAugment, to further enhance the performance of the KWS model. 4. **Transfer of Pre-trained Models**: When a large amount of labeled data is available, SSFL can adapt to new data domains by fine-tuning pre-trained models to improve performance. Experimental results show that starting training or transfer learning from pre-trained models can significantly enhance the performance of KWS models with a small amount of labeled data. In summary, this paper aims to improve the overall performance of the keyword spotting task by effectively utilizing the rich unlabeled data resources on the device side through a semi-supervised federated learning approach, and proposes effective solutions for non-independent and identically distributed data.