Secure Shapley Value for Cross-Silo Federated Learning (Technical Report)

Shuyuan Zheng,Yang Cao,Masatoshi Yoshikawa
DOI: https://doi.org/10.14778/3587136.3587141
2023-07-03
Abstract:The Shapley value (SV) is a fair and principled metric for contribution evaluation in cross-silo federated learning (cross-silo FL), wherein organizations, i.e., clients, collaboratively train prediction models with the coordination of a parameter server. However, existing SV calculation methods for FL assume that the server can access the raw FL models and public test data. This may not be a valid assumption in practice considering the emerging privacy attacks on FL models and the fact that test data might be clients' private assets. Hence, we investigate the problem of secure SV calculation for cross-silo FL. We first propose HESV, a one-server solution based solely on homomorphic encryption (HE) for privacy protection, which has limitations in efficiency. To overcome these limitations, we propose SecSV, an efficient two-server protocol with the following novel features. First, SecSV utilizes a hybrid privacy protection scheme to avoid ciphertext--ciphertext multiplications between test data and models, which are extremely expensive under HE. Second, an efficient secure matrix multiplication method is proposed for SecSV. Third, SecSV strategically identifies and skips some test samples without significantly affecting the evaluation accuracy. Our experiments demonstrate that SecSV is 7.2-36.6 times as fast as HESV, with a limited loss in the accuracy of calculated SVs.
Cryptography and Security
What problem does this paper attempt to address?
This paper attempts to address the problem of how to securely compute the Shapley value (SV) in cross-silo federated learning (cross-silo FL). Specifically, existing methods assume that the server can access the original local models and public test data when computing the SV, which may not be feasible in practical applications due to the risk of privacy attacks on federated learning models and the possibility that test data may be private assets of the clients. Therefore, the paper proposes a secure SV computation method that ensures the effective calculation of each participant's contribution to model training while protecting the privacy of the models and test data. ### Background and Problem Description 1. **Cross-silo Federated Learning**: Multiple organizations (such as banks, hospitals, etc.) collaborate to train machine learning models without sharing user data to protect privacy. Each client trains a model locally and uploads the local model to the server for aggregation to form a global model. 2. **Shapley Value**: Used to evaluate each client's contribution to the final model, it is a fair and reasonable method. However, existing SV computation methods assume that the server can access the original local models and public test data, which poses a risk of privacy leakage in practical applications. 3. **Privacy Protection Needs**: Test data is often a private asset of the clients and needs to be protected. At the same time, local models may also contain sensitive information that needs to be prevented from being accessed by the server or other clients. ### Main Contributions of the Paper 1. **HESV Scheme**: Proposes a single-server protocol HESV based on homomorphic encryption (HE) that can protect the privacy of models and test data. However, HESV has limitations in computational efficiency, mainly because ciphertext-ciphertext multiplication under homomorphic encryption is very time-consuming. 2. **SecSV Scheme**: Proposes a dual-server protocol SecSV, which significantly improves computational efficiency through a hybrid privacy protection scheme (using homomorphic encryption for models and additive secret sharing for test data) and an efficient matrix multiplication method. Additionally, SecSV introduces the SampleSkip method, which further accelerates the SV computation process by skipping some test samples. ### Key Technologies of the Solution 1. **Hybrid Privacy Protection Scheme**: Uses homomorphic encryption for models and additive secret sharing (ASS) for test data, avoiding ciphertext-ciphertext multiplication and improving computational efficiency. 2. **Efficient Matrix Multiplication**: Proposes the Matrix Reducing method, which is more efficient than the Matrix Squaring method in HESV for a large number of test samples. 3. **SampleSkip Method**: Reduces the computational load and improves efficiency by identifying and skipping test samples that have a minor impact on model performance. ### Experimental Results Experimental results show that SecSV achieves 7.2 to 36.6 times acceleration compared to HESV in various machine learning tasks (such as image recognition, news classification, bank marketing, miRNA targeting, etc.), with limited loss in the accuracy of SV computation. ### Conclusion This paper proposes a secure and efficient SV computation method, SecSV, which addresses the issue of protecting the privacy of models and test data in cross-silo federated learning, providing technical support for the fair evaluation of each participant's contribution.