Identifying Web Spam with the Wisdom of the Crowds

Yiqun Liu,Fei Chen,Weize Kong,Huijia Yu,Min Zhang,Shaoping Ma,Liyun Ru
DOI: https://doi.org/10.1145/2109205.2109207
2012-01-01
Abstract:Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam-detection techniques are usually designed for specific, known types of Web spam and are incapable of dealing with newly appearing spam types efficiently. With user-behavior analyses from Web access logs, a spam page-detection algorithm is proposed based on a learning scheme. The main contributions are the following. (1) User-visiting patterns of spam pages are studied, and a number of user-behavior features are proposed for separating Web spam pages from ordinary pages. (2) A novel spam-detection framework is proposed that can detect various kinds of Web spam, including newly appearing ones, with the help of the user-behavior analysis. Experiments on large-scale practical Web access log data show the effectiveness of the proposed features and the detection framework.
What problem does this paper attempt to address?