Privacy Protecting by Multiattribute Clustering in Data-Intensive Service

Qing Zhu,Ning Li
DOI: https://doi.org/10.1109/TrustCom.2012.224
2012-01-01
Abstract:With the explosive growth of big data, organizations are strongly encouraged to release their micro-data to support data-intensive analysis services, to provide new business opportunities and to allow every kind of scientific study as well. However, releasing medical records about individuals violates their privacy thus, privacy-preserving data publishing has become a critical issue for companies and organizations. Existing privacy protection anonymous technique mainly conducts operation directing at quasi-identifier attributes without consideration of specific relation between different values of sensitive attribute, which results in revealing of individual privacy information. The paper conducts detailed research in allusion to correlation between valuing of sensitive attribute, carries forward the idea of conducting protection to initial data by lossy join, and proposes Twice-privacy algorithm based on utility matrix and multiattribute clustering. Twice-privacy conducts a clustering of sensitive values to protect similarity, sets different weight to retain quasi-identifier attribute to query service; data obtained by clustering algorithm are of high accuracy and high value. Experimental results on real datasets show the effectiveness and efficiency of Twice-privacy algorithm. Our solutions reduce the similarity attack rate to 0%. Meanwhile, the query correction rate and analysis correction rate of the proposed have obvious promotion, inquire accuracy and analysis accuracy are also enhance.
What problem does this paper attempt to address?