Abstract:While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP to FL for ASR. In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. First, we extend the existing research on FL for ASR by exploring different aspects of recent $\textit{large end-to-end transformer models}$: architecture design, seed models, data heterogeneity, domain shift, and impact of cohort size. With a $\textit{practical}$ number of central aggregations we are able to train $\textbf{FL models}$ that are \textbf{nearly optimal} even with heterogeneous data, a seed model from another domain, or no pre-trained seed model. Second, we apply DP to FL for ASR, which is non-trivial since DP noise severely affects model training, especially for large transformer models, due to highly imbalanced gradients in the attention block. We counteract the adverse effect of DP noise by reviving per-layer clipping and explaining why its effect is more apparent in our case than in the prior work. Remarkably, we achieve user-level ($7.2$, $10^{-9}$)-$\textbf{DP}$ (resp. ($4.5$, $10^{-9}$)-$\textbf{DP}$) with a 1.3% (resp. 4.6%) absolute drop in the word error rate for extrapolation to high (resp. low) population scale for $\textbf{FL with DP in ASR}$.

What problem does this paper attempt to address?

The paper aims to address the following issues: 1. **Application of Federated Learning (FL) in Automatic Speech Recognition (ASR)**: Although federated learning has become a promising method for training machine learning models, its application in the field of automatic speech recognition is still limited. The paper aims to extend existing research by exploring different aspects such as architecture design, seed models, data heterogeneity, domain transfer, and the impact of batch size. 2. **Combining Differential Privacy (DP) with Federated Learning for ASR**: Federated learning itself cannot guarantee user privacy and needs to be combined with differential privacy to enhance privacy protection. However, there is currently no research applying differential privacy to federated learning for ASR. This paper aims to fill this research gap by proposing an ASR benchmark to evaluate the effectiveness of combining federated learning with differential privacy and establishing preliminary baselines. Specifically, the paper conducts research in the following areas: - Exploring the performance of large-scale end-to-end Transformer models in federated learning, including the impact of model architecture, initialization methods, and data distribution on model performance. - Applying differential privacy to the ASR task in federated learning and proposing a hierarchical clipping strategy to address the noise interference brought by differential privacy. - Validating the performance of federated learning models on multiple language datasets, including English, German, and French, and demonstrating that federated learning can train models close to the level of centralized training even in the presence of data heterogeneity and domain mismatch. Through these studies, the paper provides a theoretical foundation and technical support for the practical application of federated learning in the field of ASR.

Federated Learning with Differential Privacy for End-to-End Speech Recognition

DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation

Training Large ASR Encoders with Differential Privacy

Using Decentralized Aggregation for Federated Learning with Differential Privacy

A Two-Stage Differential Privacy Scheme for Federated Learning Based on Edge Intelligence

Convergent Differential Privacy Analysis for General Federated Learning: the $f$-DP Perspective

Adaptive Differential Privacy in Federated Learning: A Priority-Based Approach

Belt and Braces: When Federated Learning Meets Differential Privacy

Belt and Brace: When Federated Learning Meets Differential Privacy

FDP-FL: Differentially Private Federated Learning with Flexible Privacy Budget Allocation

Differentially Private Federated Learning with an Adaptive Noise Mechanism

Enhancing Federated Learning with Adaptive Differential Privacy and Priority-Based Aggregation

Differentially Private Federated Learning on Heterogeneous Data

Uldp-FL: Federated Learning with Across-Silo User-Level Differential Privacy

ULDP-FL: Federated Learning with Across Silo User-Level Differential Privacy

Federated Learning with Differential Privacy

Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models

A Fine-Grained Differentially Private Federated Learning Against Leakage from Gradients

End-to-End Speech Recognition from Federated Acoustic Models

Differentially-Private Multi-Tier Federated Learning

Multi-Stage Asynchronous Federated Learning with Adaptive Differential Privacy.