Abstract:Abstract Advances in Natural Language Processing (NLP) have been significantly driven by the adoption of comprehensive pretrained language models (PLMs) such as BERT, RoBERTa. Nevertheless, increasing concerns regarding data privacy and the enforcement of stringent data protection regulations, such as PDPA and GDPR, have highlighted the limitations of traditional centralized machine learning methods. Federated Learning (FL) emerges as a promising solution that mitigates privacy concerns by training models on client devices and aggregating the parameters on a central server, thus avoiding direct data transfer. Despite its considerable potential, FL?s application in NLP faces numerous challenges. These include the need to replicate architectural designs across different devices, the management of non-Independent and Identically Distributed (non-IID) data, and the significant communication overhead caused by frequent transfers of model parameters. Federated Distillation (FD) has been proposed to overcome these challenges by facilitating information transfer through unlabeled public proxy datasets. This approach helps to reduce communication costs and promotes collaboration among different model architectures. However, FD carries potential privacy risks and may result in substantial loss of previously acquired knowledge, thus diminishing the model's effectiveness. To address these issues, we introduce the Privacy-preserving Federated Distillation Method for Pretraining Language Models (PFDP). Our approach distinguishes itself from conventional methods by injecting noise into a select portion of a predetermined dataset, thereby minimizing its impact on the model's utility. Besides, PFDP utilizes transfer learning to improve the generalization abilities of the global model and reduce the impact of catastrophic forgetting. The extensive assessment across many classification tasks illustrates the efficacy of PFDP in enhancing accuracy while safeguarding privacy.

FedID: Federated Interactive Distillation for Large-Scale Pretraining Language Models

FedDGP: Disentangling Global and Personal Models for Federated Learning

FEDBFPT: an Efficient Federated Learning Framework for BERT Further Pre-Training

FedDistill: Global Model Distillation for Local Model De-Biasing in Non-IID Federated Learning

PFDP: Privacy-preserving Federated Distillation Method for Pretraining Language Models

FedMD: Heterogenous Federated Learning via Model Distillation

FedFed: Feature Distillation Against Data Heterogeneity in Federated Learning

Federated Virtual Learning on Heterogeneous Data with Local-global Distillation

Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

Fine-tuning Global Model Via Data-Free Knowledge Distillation for Non-IID Federated Learning

Federated Learning Via Input-Output Collaborative Distillation

Federated Distillation: A Survey

FedPD: Defying data heterogeneity through privacy distillation

FedTweet: Two-fold Knowledge Distillation for non-IID Federated Learning

FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning

OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

FedRAD: Heterogeneous Federated Learning via Relational Adaptive Distillation

Towards Secure and Robust Federated Distillation in Distributed Cloud: Challenges and Design Issues

One-shot Federated Learning via Synthetic Distiller-Distillate Communication

Decentralized Federated Learning Via Mutual Knowledge Distillation.

Data-Free Knowledge Filtering and Distillation in Federated Learning