A Classification Method for Privacy-Preserved Data Using Bayesian Rule

Pan YANG,Xiaolin GUI,Jian AN,Feng TIAN,Gang WANG
DOI: https://doi.org/10.7652/xjtuxb201504008
2015-01-01
Abstract:A classification method for perturbed data using the Bayesian rule is presented to solve the problem that the result of data mining is affected when the retrievable general additive data perturbation(RGADP)algorithm is used to preserve privacy in database.The process of RGADP algorithm is analyzed,and the Bayesian rule is used to estimate the probability distribution of original data from the perturbed data.Then,new data are reconstructed from the estimated probability distribution and are classified to increase the accuracy of classification.Experimental results show that the probability distribution estimated by the proposed method is close to the original probability distribution.Comparison with the classification accuracy of perturbed data shows that the classification accuracy of the reconstructed data increases by more than 4% in average,and is closer to the original classification accuracy.Thus,the method can effectively reduce the effect of the perturbation algorithm on classification.Moreover,the running time of the method is proportional to the amount of data and the number of groups.The method costs less than 200 ms to reconstruct 10 thousands data,and has a high efficiency.
What problem does this paper attempt to address?