Ascertaining Spam Web Pages Based on Ant Colony Optimization Algorithm.

Shou-Hong Tang,Yan Zhu,Fan Yang,Qing Xu
DOI: https://doi.org/10.1007/978-3-319-10085-2_21
2014-01-01
Abstract:Web spam is troubling both internet users and search engine companies, because it seriously damages the reliability of search engine and the benefit of Web users, degrades the Web information quality. This paper discusses a Web spam detection method inspired by Ant Colony Optimization (ACO) algorithm. The approach consists of two stages: preprocessing and Web spam detection. On preprocessing stage, the class-imbalance problem is solved by using a clustering technique and an optimal feature subset is culled by Chi-square statistics. The dataset is also discretized based on the information entropy method. These works make the spam detection at the second stage more efficient and easier. On next stage, spam detection model is built based on the ant colony optimization algorithm. Experimental results on the WEBSPAM-UK2006 reveal that our approach can achieve the same or even better results with less number of features.
What problem does this paper attempt to address?