Exploring Anonymous User Reviews: Linkability Analysis Based on Machine Learning

Cheng Huang,Jianbing Ni,Rongxing Lu,Xuemin (Sherman) Shen
DOI: https://doi.org/10.1109/globecom38437.2019.9013509
2019-01-01
Abstract:Identity anonymization is believed to be a common mechanism to protect users' privacy in a public review platform, as each user's unique identifier is removed to ensure pseudonymity and unlinkability. However, the usefulness of the mechanism is not explicit, i.e., whether it is possible for an adversary to link anonymous reviews from the same user has not been well studied. In this paper, we attempt to explore this issue by means of machine learning techniques. Specifically, we first extract major features from anonymous reviews and propose some adaptive metrics to measure their effectiveness. Then, we exploit these features and several machine learning methods to link the anonymous reviews created by the same user in a real-world dataset. Considering that different adversaries has different background knowledge, both supervised and unsupervised methods, such as random forest and hierarchical agglomerative clustering, are designed and utilized to perform linkability attacks. The simulation results demonstrate that the supervised methods have a good performance, i.e., almost 40% anonymous reviews can be accurately linked to users if an adversary has the background knowledge. The unsupervised methods, compared with supervised methods, has a bad performance, i.e., it is difficult for an adversary without the background knowledge to link anonymous reviews with a high probability.
What problem does this paper attempt to address?