Jie Xu,Karthikeyan Saravanan,Rogier van Dalen,Haaris Mehmood,David Tuckey,Mete Ozay
Abstract:Federated learning (FL) allows clients to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributions. The randomness makes it infeasible to train large transformer-based models, common in modern federated learning systems. In this work, we empirically evaluate the practicality of fine-tuning large scale on-device transformer-based models with differential privacy in a federated learning system. We conduct comprehensive experiments on various system properties for tasks spanning a multitude of domains: speech recognition, computer vision (CV) and natural language understanding (NLU). Our results show that full fine-tuning under differentially private federated learning (DP-FL) generally leads to huge performance degradation which can be alleviated by reducing the dimensionality of contributions through parameter-efficient fine-tuning (PEFT). Our benchmarks of existing DP-PEFT methods show that DP-Low-Rank Adaptation (DP-LoRA) consistently outperforms other methods. An even more promising approach, DyLoRA, which makes the low rank variable, when naively combined with FL would straightforwardly break differential privacy. We therefore propose an adaptation method that can be combined with differential privacy and call it DP-DyLoRA. Finally, we are able to reduce the accuracy degradation and word error rate (WER) increase due to DP to less than 2% and 7% respectively with 1 million clients and a stringent privacy budget of $\epsilon=2$.
Machine Learning,Cryptography and Security,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively fine - tune large - scale Transformer - based models in federated learning under differential privacy protection to reduce performance degradation. Specifically, the paper focuses on the following aspects:
1. **Performance Degradation under Differential Privacy**: In traditional differential - privacy federated learning (DP - FL), due to the addition of a large amount of noise, the model performance drops significantly. Especially on large - scale Transformer models, this performance degradation is more obvious.
2. **Effectiveness of Parameter - Efficient Fine - Tuning (PEFT) Methods**: By reducing the number of parameters sent by clients to the server, the impact of noise can be reduced, thereby improving model performance. The paper evaluates several existing PEFT methods, such as LoRA, Adapter, Compacter, and BitFit, to determine which method performs best under differential privacy.
3. **Improvement of Dynamic Low - Rank Adaptation (DyLoRA)**: DyLoRA is a method for dynamically adjusting low - rank matrices, which can avoid the problem of manual rank selection. However, directly applying DyLoRA to federated learning will break differential privacy. Therefore, the paper proposes a new method, DP - DyLoRA, to solve this problem by uniformly selecting the rank on the server side.
4. **Multi - Domain Experimental Verification**: The paper conducts experiments on datasets in multiple domains, including natural language understanding, computer vision, and speech recognition, to verify the effectiveness of the proposed method.
### Main Contributions of the Paper
1. **Benchmarking Existing DP - PEFT Methods**: The paper conducts a detailed benchmarking of existing DP - PEFT methods and compares their performance in differential - privacy federated learning.
2. **Proposing the DP - DyLoRA Algorithm**: The paper proposes a new algorithm, DP - DyLoRA, which further improves the privacy - utility trade - off by optimizing the rank selection range instead of fixing the rank.
### Experimental Results
- **Performance Improvement**: The experimental results show that DP - DyLoRA significantly outperforms existing DP - PEFT methods, including DP - Adapter, DP - Compacter, DP - BitFit, and DP - LoRA, on multiple datasets.
- **Privacy Assurance**: Under a strict privacy budget (\(\epsilon = 2\)), DP - DyLoRA can control the accuracy degradation within 2% and the word error rate (WER) increase within 7%.
- **Multi - Domain Applicability**: The paper conducts experiments on datasets in multiple domains such as natural language understanding, computer vision, and speech recognition, verifying the wide applicability of DP - DyLoRA.
### Summary
By proposing the DP - DyLoRA algorithm, this paper solves the performance degradation problem of fine - tuning large - scale Transformer models in differential - privacy federated learning, providing an effective solution for practical applications.