GAPBAS: Genetic Algorithm-based Privacy Budget Allocation Strategy in Differential Privacy K-Means Clustering Algorithm

Yong Li,Xiao Song,Yuchun Tu,Ming Liu
DOI: https://doi.org/10.1016/j.cose.2023.103697
IF: 5.105
2024-01-01
Computers & Security
Abstract:The differential privacy k -means (DP k -means) clustering algorithm emerged to address the privacy protection challenges in the field of data mining. However, the algorithm encounters difficulties in achieving clustering usability and convergence. Privacy budget ( ε ), a critical parameter determining the noise addition in differential privacy algorithms, garners significant attention. Consequently, researchers have shifted their focus to studying privacy budget allocation strategies within the DP k -means clustering algorithm. However, the selection of a privacy budget allocation strategy in the DP k -means algorithm is an NP-hard problem. Our initial intuition is that genetic algorithms can efficiently discover relatively optimal privacy budget sequences. In this context, we propose a genetic algorithm-based privacy budget allocation strategy (GAPBAS) to ensure the convergence and usability of the DP k -means algorithm. Firstly, convergence is ensured by selecting improved initial centroids and rigorously controlling the minimum privacy budget for the DP k -means algorithm. Additionally, the privacy budget allocation strategy of the DP k -means algorithm is reformulated as a combinatorial optimization problem. This entails merging privacy budgets from multiple iterative rounds into a sequential sequence and utilizing a genetic algorithm to select the optimal privacy budget allocation strategy, thereby significantly enhancing the usability of the DP k -means algorithm. Comparative experiments against the other four privacy budget allocation strategies in the DP k -means algorithm demonstrate the superior performance of the genetic algorithm-based privacy budget allocation strategy at the same level of privacy protection.
computer science, information systems
What problem does this paper attempt to address?