Abstract:Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g., online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this article, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (nine in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have under-estimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the under-performance. Based on these insights, we propose four new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model location contexts and time contexts. Extensive evaluations show that our algorithms achieve more than 17 percent performance gain over the best existing algorithms, confirming our insights. Further, we propose two new location-privacy preserving mechanisms utilizing the spatio-temporal mismatches to better protect users' privacy against the de-anonymization attack. Evaluation results show that our proposed mechanisms can reduce the performance of de-anonymization attacks by over 8.0 percent, demonstrating the effectivene-s of our insights.

Privacy-Preserving Internet Traffic Publication

An Enhanced K-Anonymity Model Against Homogeneity Attack.

Enhancing Sink-Location Privacy in Wireless Sensor Networks Through K-Anonymity

Network Coding Based Privacy Preservation Against Traffic Analysis in Multi-Hop Wireless Networks

Privacy Risk in Anonymized Heterogeneous Information Networks

K-Anonymity for Crowdsourcing Database

An IP Address Anonymization Scheme with Multiple Access Levels.

Clustering-Based k-anonymity

On the Utility of Anonymized Flow Traces for Anomaly Detection

Relationship Privacy Leakage in Network Traffics.

Bits Learning: User-Adjustable Privacy Versus Accuracy in Internet Traffic Classification

A Top-Down Approach For Approximate Data Anonymisation

Towards publishing directed social network data with k‐degree anonymization

A Trajectory K-Anonymity Model Based on Point Density and Partition

A systematic comparison of measures for k-anonymity in networks

(K,p)-Anonymity

Traffic Information Publication with Privacy Preservation

Anonymous Traffic Detection Based on Feature Engineering and Reinforcement Learning

Data De-anonymization : From Mobility Traces to On-line Social Networks

kIP: a Measured Approach to IPv6 Address Anonymization

Anonymization and De-Anonymization of Mobility Trajectories: Dissecting the Gaps Between Theory and Practice