Resisting Tag Spam by Leveraging Implicit User Behaviors

Ennan Zhai,Zhenhua Li,Zhenyu Li,Fan Wu,Guihai Chen
DOI: https://doi.org/10.14778/3021924.3021939
IF: 2.5
2016-01-01
Proceedings of the VLDB Endowment
Abstract:Tagging systems are vulnerable to tag spam attacks. However, defending against tag spam has been challenging in practice, since adversaries can easily launch spam attacks in various ways and scales. To deeply understand users' tagging behaviors and explore more effective defense, this paper first conducts measurement experiments on public datasets of two representative tagging systems: Del.icio.us and CiteULike. Our key finding is that a significant fraction of correct tag-resource annotations are contributed by a small number of implicit similarity cliques, where users annotate common resources with similar tags. Guided by the above finding, we propose a new service, called Spam-Resistance-as-a-Service (or SRaaS), to effectively defend against heterogeneous tag spam attacks even at very large scales. At the heart of SRaaS is a novel reputation assessment protocol, whose design leverages the implicit similarity cliques coupled with the social networks inherent to typical tagging systems. With such a design, SRaaS manages to offer provable guarantees on diminishing the influence of tag spam attacks. We build an SRaaS prototype and evaluate it using a large-scale spam-oriented research dataset (which is much more polluted by tag spam than Del.icio.us and CiteULike datasets). Our evaluational results demonstrate that SRaaS outperforms existing tag spam defenses deployed in real-world systems, while introducing low overhead.
What problem does this paper attempt to address?