Efficient Federated Unlearning under Plausible Deniability

Ayush K. Varshney,Vicenç Torra
2024-10-14
Abstract:Privacy regulations like the GDPR in Europe and the CCPA in the US allow users the right to remove their data ML applications. Machine unlearning addresses this by modifying the ML parameters in order to forget the influence of a specific data point on its weights. Recent literature has highlighted that the contribution from data point(s) can be forged with some other data points in the dataset with probability close to one. This allows a server to falsely claim unlearning without actually modifying the model's parameters. However, in distributed paradigms such as FL, where the server lacks access to the dataset and the number of clients are limited, claiming unlearning in such cases becomes a challenge. This paper introduces an efficient way to achieve federated unlearning, by employing a privacy model which allows the FL server to plausibly deny the client's participation in the training up to a certain extent. We demonstrate that the server can generate a Proof-of-Deniability, where each aggregated update can be associated with at least x number of client updates. This enables the server to plausibly deny a client's participation. However, in the event of frequent unlearning requests, the server is required to adopt an unlearning strategy and, accordingly, update its model parameters. We also perturb the client updates in a cluster in order to avoid inference from an honest but curious server. We show that the global model satisfies differential privacy after T number of communication rounds. The proposed methodology has been evaluated on multiple datasets in different privacy settings. The experimental results show that our framework achieves comparable utility while providing a significant reduction in terms of memory (30 times), as well as retraining time (1.6-500769 times). The source code for the paper is available.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of achieving efficient machine unlearning in a Federated Learning (FL) environment, i.e., effectively removing the influence of specific data points from the model without retraining the entire model. Specifically, the paper focuses on how to allow the server to reasonably deny the participation of a particular client in model training while ensuring privacy, thereby satisfying the user's right to request the deletion of their data. ### Background and Problem Description With the implementation of privacy regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), users have the right to request the deletion of their personal data. In machine learning (ML) applications, this means there needs to be a mechanism to "forget" the user's contribution, i.e., modify the model parameters to eliminate the influence of specific data points on the model weights. However, existing machine unlearning methods are relatively easy to implement in centralized ML environments but face challenges in distributed environments like federated learning because the server does not have access to the complete dataset, and the number of participating clients is limited. ### Main Contributions of the Paper 1. **Proposed a new federated unlearning framework**: This framework allows the server to provide "Proof-of-Deniability" (PoD) to reasonably deny the participation of a particular client in model training. 2. **Introduced a client-level differential privacy mechanism**: Protects the identity of clients participating in the aggregation process, preventing an honest but curious server from inferring specific client information. 3. **Theoretical analysis**: Proved that the global model satisfies (ϵ, δ)-differential privacy after multiple communication rounds. 4. **Experimental results**: Demonstrated significant improvements in computational efficiency and substantial reductions in server-side storage requirements. ### Method Overview The paper proposes a variant of the Federated Averaging (FedAvg) algorithm based on integral privacy, called "Perturbed k-Anonymous Integrally Private Federated Averaging." The specific steps are as follows: 1. **Initialization and broadcasting**: The server initializes the global model and broadcasts the current global model to all clients in each communication round. 2. **Client updates**: Each client updates the model parameters based on local data. 3. **Clustering and representative selection**: The server clusters the clients' model updates based on a predefined distance threshold (∆) and randomly selects a representative for each cluster. 4. **Perturbation and aggregation**: Adds noise to the selected representative models to protect client identities, then aggregates these perturbed model updates to generate a new global model. 5. **Storage and rollback**: The server stores the client indices of each cluster so that when a forgetting request is received, it can remove the historical updates of the target client. If the model updates in a cluster are fewer than a predefined number (x), the server will roll back to the previous state and re-execute the unlearning mechanism. ### Deniability Through the above method, the server can reasonably deny the participation of a particular client in some cases without actually performing the unlearning operation. Specifically, if there are enough other client model updates in a cluster, the server can provide "Proof-of-Deniability," indicating that a similar global model can be generated even without the participation of that client. ### Client-Level Privacy Protection To further protect client identities, the paper introduces a client-level differential privacy mechanism. By adding Gaussian noise to the representative model of each cluster, any specific client's model update becomes indistinguishable, preventing the server from inferring which client was selected as the cluster representative. ### Summary The paper proposes an efficient and privacy-preserving federated unlearning framework that can reasonably deny client participation in a federated learning environment while maintaining model performance and efficiency. This provides a new solution for implementing the right to delete user data, especially in scenarios with strict data privacy requirements.