A divide-and-conquer approach to privacy-preserving high-dimensional big data release

Rong Wang,Junchuan Liang,Siyu Wang,Chin-Chen Chang
DOI: https://doi.org/10.1016/j.jisa.2024.103756
IF: 4.96
2024-04-11
Journal of Information Security and Applications
Abstract:Data anonymization has been used extensively in data-sharing scenarios to protect the privacy of people's raw data. However, in the era of Big Data, the amount of data released has increased so rapidly that most existing data anonymization approaches have become ineffective. This is because the scalability of these approaches is inadequate when dealing with large-scale data. In addition, these approaches cannot handle the sparseness of high-dimensional search space. In this paper, we propose a MapReduce-based approach to address the problem of anonymization of high-dimensional big data. First, our approach uses a vertical partition criterion based on normalized mutual information to decompose raw data into different fragments with smaller dimensionality. Then, a clustering-based local recoding is used to group the records of each fragment into clusters. During this phase, records with similar values of quasi-identifier attributes but dissimilar values of sensitive attributes tend to be grouped. Finally, clusters of each fragment are anonymized to resist simultaneously (1) the disclosure of the individual identification and (2) proximity breaches. Our proposed approach is integrated with MapReduce to implement parallel distributed computing. Experiments on three public data sets demonstrated that our approach outperformed the compared approaches in terms of efficiency and scalability.
computer science, information systems
What problem does this paper attempt to address?