Selective Knowledge Sharing for Privacy-Preserving Federated Distillation without A Good Teacher

Jiawei Shao,Fangzhao Wu,Jun Zhang
2023-12-15
Abstract:While federated learning is promising for privacy-preserving collaborative learning without revealing local data, it remains vulnerable to white-box attacks and struggles to adapt to heterogeneous clients. Federated distillation (FD), built upon knowledge distillation--an effective technique for transferring knowledge from a teacher model to student models--emerges as an alternative paradigm, which provides enhanced privacy guarantees and addresses model heterogeneity. Nevertheless, challenges arise due to variations in local data distributions and the absence of a well-trained teacher model, which leads to misleading and ambiguous knowledge sharing that significantly degrades model performance. To address these issues, this paper proposes a selective knowledge sharing mechanism for FD, termed Selective-FD. It includes client-side selectors and a server-side selector to accurately and precisely identify knowledge from local and ensemble predictions, respectively. Empirical studies, backed by theoretical insights, demonstrate that our approach enhances the generalization capabilities of the FD framework and consistently outperforms baseline methods.
Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses several key challenges in Federated Learning (FL), specifically including: 1. **Privacy Protection Issue**: Although federated learning enables collaborative training without data leaving the local environment, there is still a risk of white-box privacy attacks, where sensitive information about the dataset may be leaked through model parameters. 2. **Communication Overhead Issue**: The model exchange in federated learning leads to high communication costs, which becomes more pronounced as the model size increases. 3. **Heterogeneity Issue**: Standard federated training methods require all clients to adopt the same model architecture, which does not adapt well to clients with different computational resources. To address the above issues, the authors propose a selective knowledge-sharing mechanism named Selective-FD. This mechanism is mainly applied to Federated Distillation (FD), which is a method different from traditional federated learning. It trains models by transmitting "knowledge" rather than model parameters between clients. However, without a good teacher model for guidance, federated distillation is susceptible to misleading and ambiguous knowledge, which can degrade model performance. The core idea of Selective-FD is to filter out misleading and ambiguous knowledge. Specifically, this method includes a client-side selector and a server-side selector. The client-side selector is used to identify abnormal samples from the proxy dataset and prevent them from being shared to avoid misleading other clients. The server side averages the prediction results uploaded by clients and filters out the set of prediction results with high entropy values, then returns the remaining prediction results to the clients for knowledge distillation. Through experimental validation, Selective-FD performs excellently in handling non-independent and identically distributed (non-IID) data, significantly improving test accuracy. Additionally, compared to traditional federated averaging (FedAvg), Selective-FD significantly reduces communication costs and provides stronger privacy protection. Particularly, when using hard labels for knowledge transfer, the performance of Selective-FD is close to or even better than using soft labels. In summary, this paper proposes a new federated distillation framework, Selective-FD, aimed at addressing privacy, communication, and heterogeneity issues in federated learning, and demonstrates its effectiveness and superiority through experiments.