Privacy-Preserving Sequential Data Publishing

Huili Wang,Wenping Ma,Haibin Zheng,Zhi Liang,Qianhong Wu
DOI: https://doi.org/10.1007/978-3-030-36938-5_37
2019-01-01
Abstract:Machine learning in artificial intelligence relies on legitimate big data, where the process of data publishing involves a large number of privacy issues. m-Invariance is a fundamental privacy-preserving notion in microdata republication. Unfortunately, if for big data release, the existing generalization based m-Invariance requiring to modify the origin microdata incurs the problems of data utility loss and poor aggregate querying performance. Furthermore, due to the high dimension of quasi-identifiers in big data, unaffordable generalization operations makes it difficult to be practical. In this paper, we remedy the drawbacks above to achieve m-Invariance in big data release. We first propose a new anatomy based m-Invariance definition and framework, where the anatomy approach tries to achieve privacy by breaking the correlations between the sensitive attributes and non-sensitive identifiers. We next establish a series of criteria for anatomy to cope with republication due to the data dynamics. We then develop an algorithm to realize the above ideas. Theoretical and experimental analysis confirm the advantages of our anatomy based m-Invariance approach in the terms of data utility, aggregate querying accuracy and capacity to process high dimension of quasi-identifiers in big data release.
What problem does this paper attempt to address?