DP-QIC: A differential privacy scheme based on quasi-identifier classification for big data publication
Si Chen,Anmin Fu,Shui Yu,Haifeng Ke,Mang Su
DOI: https://doi.org/10.1007/s00500-021-05692-7
IF: 3.732
2021-03-12
Soft Computing
Abstract:With the advent of the era of big data, data privacy protection has become a valuable topic in the field of data publication. Unfortunately, traditional methods of privacy protection, k-anonymity, and its extensions are not absolutely secure as an adversary with background knowledge can determine the owner of a record. The emergence of differential privacy provides a reasonable alternative for privacy security, but the existing solutions ignore the correlation between sensitive attributes and other attributes. In this paper, we propose a new differential privacy scheme based on quasi-identifier classification for big data publication (DP-QIC). It is a new data publishing scheme based on the obfuscation of attribute correlation. We innovatively present quasi-identifier classification based on sensitive attributes and the privacy ratio for evaluating the data set vulnerability. DP-QIC achieves data privacy-protecting through four steps: data collection, grouping and shuffling, generalization, merging, and noise adding, which retains the overall statistical characteristics of the data set. Moreover, the exponential mechanism and the Laplace mechanism are integrated to ensure higher flexibility and a stronger level of privacy protection, so DP-QIC can be used for privacy processing of different data groups in future development. Finally, we have compared the performance of our scheme with the other two famous schemes in the industry. Experimental results demonstrate that DP-QIC has obvious advantages in data utility, privacy protection, and processing efficiency.
computer science, artificial intelligence, interdisciplinary applications