Privacy Preserving Feature Selection in Distributed Environment

Wan Wenqiang,Zhang Lingwei
2012-01-01
Abstract:Privacy preserving and feature selection are very important in data mining.Thus,how to select feature effectively based on privacy preserving is also a hot topic.Under the Map-Reduce distributed environment framework,proposed was the combination of the differential privacy and principal component analysis with the statistics including entropy,misclassification gain,and gini index,a new privacy preserving feature selection algorithm on distributed environment.The algorithm achieved the purposes of protecting privacy of both data sets and features.The simulation results on several bench-mark data sets indicated that this algorithm performed well.During the selection of the important features,it could protect privacy information to a certain extent.
What problem does this paper attempt to address?