Protecting Data Privacy from Being Inferred from High Dimensional Correlated Data

Huafeng Ba,Xiaoming Gao,Xiaofeng Zhang,Zhenyu He
DOI: https://doi.org/10.1109/wi-iat.2014.139
2014-01-01
Abstract:In the era of big data, privacy becomes a challenging issue which already attracts a good number of research efforts. In the literature, most of existing privacy preserving algorithms focus on protecting users' privacy from being disclosed by making the set of designated semi-id features indiscriminate. However, how to automatically determine the appropriate semi-id features from high-dimensional correlated data is seldom studied. Therefore, in this paper we first theoretically study the problem and propose the IPFS algorithm to find all possible features forming the candidate semi-id feature set which can infer users' privacy. Then, the KIPFS algorithm is proposed to find the key features from the candidate semi-id feature set. By anonymizing the key feature set, called as key inferring privacy features (KIPFS), users' privacy is protected. To evaluate the effectiveness and the efficacy of the proposed approach, two state-of-the-art algorithms, i.e., K-anonymity and t-closeness, applied on the designated semi-id feature set are chose as the baseline algorithms and their revised versions are applied on the KIPFS for the performance comparison. The promising results showed that by anonymizing the identified KIPFS, both aforementioned algorithms can achieve better performance than the original ones in terms of efficiency and data quality.
What problem does this paper attempt to address?