Differentially Private K-Means Publishing with Distributed Dimensions

Boyu Zhu,Yuan Zhang,Tingting Chen,Sheng Zhong
DOI: https://doi.org/10.1109/cscwd61410.2024.10580021
2024-01-01
Abstract:In this paper, we address the critical concerns related to dataset privacy in the context of k-means clustering publishing within a distributed dimension setting. By leveraging differential privacy mechanisms, we propose a novel framework that integrates a differentially private classifier, constructed through voting based on raw clustering results, and an enhanced generative adversarial network (GAN) simulating the classifier’s behavior in inferring class labels for a public dataset. Our approach generates synthetic clustering results that mimic real outcomes in classification tasks, ensuring differential privacy and minimizing noise. Our contributions include a comprehensive exploration of privacy issues, the introduction of a novel privacy-preserving k-means clustering framework, and theoretical analyses demonstrating sensitivity and differential privacy guarantees. Evaluation on the MNIST dataset demonstrates the effectiveness of the framework, achieving 82.22% accuracy with a (10.48, 10 −9 )-differential-privacy guarantee, compared to 83.45% accuracy without privacy-preserving.
What problem does this paper attempt to address?