Federated Learning with Differential Privacy for End-to-End Speech Recognition

Martin Pelikan,Sheikh Shams Azam,Vitaly Feldman,Jan "Honza" Silovsky,Kunal Talwar,Tatiana Likhomanenko
2023-09-30
Abstract:While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP to FL for ASR. In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. First, we extend the existing research on FL for ASR by exploring different aspects of recent $\textit{large end-to-end transformer models}$: architecture design, seed models, data heterogeneity, domain shift, and impact of cohort size. With a $\textit{practical}$ number of central aggregations we are able to train $\textbf{FL models}$ that are \textbf{nearly optimal} even with heterogeneous data, a seed model from another domain, or no pre-trained seed model. Second, we apply DP to FL for ASR, which is non-trivial since DP noise severely affects model training, especially for large transformer models, due to highly imbalanced gradients in the attention block. We counteract the adverse effect of DP noise by reviving per-layer clipping and explaining why its effect is more apparent in our case than in the prior work. Remarkably, we achieve user-level ($7.2$, $10^{-9}$)-$\textbf{DP}$ (resp. ($4.5$, $10^{-9}$)-$\textbf{DP}$) with a 1.3% (resp. 4.6%) absolute drop in the word error rate for extrapolation to high (resp. low) population scale for $\textbf{FL with DP in ASR}$.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Application of Federated Learning (FL) in Automatic Speech Recognition (ASR)**: Although federated learning has become a promising method for training machine learning models, its application in the field of automatic speech recognition is still limited. The paper aims to extend existing research by exploring different aspects such as architecture design, seed models, data heterogeneity, domain transfer, and the impact of batch size. 2. **Combining Differential Privacy (DP) with Federated Learning for ASR**: Federated learning itself cannot guarantee user privacy and needs to be combined with differential privacy to enhance privacy protection. However, there is currently no research applying differential privacy to federated learning for ASR. This paper aims to fill this research gap by proposing an ASR benchmark to evaluate the effectiveness of combining federated learning with differential privacy and establishing preliminary baselines. Specifically, the paper conducts research in the following areas: - Exploring the performance of large-scale end-to-end Transformer models in federated learning, including the impact of model architecture, initialization methods, and data distribution on model performance. - Applying differential privacy to the ASR task in federated learning and proposing a hierarchical clipping strategy to address the noise interference brought by differential privacy. - Validating the performance of federated learning models on multiple language datasets, including English, German, and French, and demonstrating that federated learning can train models close to the level of centralized training even in the presence of data heterogeneity and domain mismatch. Through these studies, the paper provides a theoretical foundation and technical support for the practical application of federated learning in the field of ASR.