A Multidata-Source Privacy-Preserving Approach: A Semisupervised Learning-Based Model for Migrating Data Annotation

Shuai Li,Jialiang Zhang,Liang Hu,Chengyu Sun,Juncheng Hu,Hongtu Li
DOI: https://doi.org/10.2139/ssrn.4163369
2022-01-01
SSRN Electronic Journal
Abstract:In the era of big data, a single computing node can no longer meet the demand, and a large quantity of data or larger models often have higher requirements on computing resources. Therefore, practical data application scenarios usually adopt distributed machine learning methods, and when the data have privacy-sensitive properties, the central scheduling of distributed machine learning will create a great risk of privacy leakage to user data. In this paper, a multidata-source privacy-preserving model is designed based on the PATE model combined with distributed machine learning ideas. We call the model in this paper the multiparty semisupervised-based knowledge transfer learning privacy-preserving model (MSSKT). In this method, the predicted privacy data model is trained on the multiteacher party, and the predicted results are sent to the student party for aggregation. The privacy property of the "student" model can be understood intuitively: there is no single teacher and therefore no single dataset that determines student training, and it can be understood formally as differential privacy (DP). Thus, even if an attacker attacks both the "student" and the single "teacher" model that is not publicly available, the privacy protection of sensitive data still holds. Through our experiments, the results confirm the effectiveness of this approach.
What problem does this paper attempt to address?