A Trajectory Data Publication Method under Differential Privacy
Zheng HUO,Xiao-Feng MENG
DOI: https://doi.org/10.11897/SP.J.1016.2018.00400
2018-01-01
Abstract:Trajectory data have rich spatiotemporal information,which may cause users' personal privacy leakage.In order to avoid this,privacy-preserving techniques are required before the publication of trajectory data.Most of the existing trajectory privacy-preserving techniques are based on the k-anonymity model,which has two main drawbacks:one is low privacy guarantee;the other is heavily dependent on the background knowledge that the attackers know.In recent years,researchers proposed differential privacy,which is known as the strongest privacy-preserving model,and it is quite good at publication of statistical information.While,publication of statistical information may also cause leakage of users' location privacy.At first,we propose two attack models in publication of count values of location samples,called sparse location attack and maximum moving speed attack.Then we propose two trajectory data publication methods under differential privacy:in free space,we propose a differentially private trajectory data publication method based on noisy quad-tree.The privacy budget is divided according to the level of the quad-tree;Laplace noise is added into each level's count value of moving object.In road network space,we propose a differentially private trajectory data publication method base on noisy R-tree.While R-tree is used to index road segment,privacy budget is divided and Laplace noise is added to the count values of each road segment or the minimum bounding rectangles of road segment.Both method spublish noisy count values of location data of each timestamp,which form a sequence of count values.Differential privacy algorithms have higher privacy guarantee than traditional k-anonymity methods.Actually,published trajectory data is a location count value sequence of multiple consequent timestamps.Since the main idea of differential privacy is to add Laplace noise into original data,and the added Laplace noise are independent at each timestamp,it may cause data inconsistency,which may reduce the utility of the published data.The data inconsistency happens when publication of location data of two or more consequent timestamps.We propose a heuristic algorithm to solve this problem.This algorithm is an extreme value problem under certain constraints.The constraints we conclude are the maximum and minimum numbers of each area or road segment,according to the moving speed,area size or the length of a road segment.The extreme value function is the L2 distance between the noisy count value sequence and the consistency count value sequence,we may find a consistent count value sequence which is most close to the noisy data.At last,we conduct a set of experiments on two data sets,one is generated location data under Gaussian distribution,and the other is generated data under Oldenburg city road network constraints.We measure data utility and runtime of each algorithm before and after consistency procedure,the results show that the consistency algorithm may improve the data utility about 200% in average,which shows the effectiveness of the consistency algorithm.We also test runtime of each algorithm,with the incensement of the data set size,run time increases linearly,which means the algorithms are easy to extend.