Poster: Cud: Crowdsourcing For Url Spam Detection

Jun Hu,Hongyu Gao,Zhichun Li,Yan Chen
DOI: https://doi.org/10.1145/2046707.2093493
2011-01-01
Abstract:The prevalence of spam URLs in Internet services, such as email, social networks, blogs and online forums has become a serious problem. These spam URLs host spam advertisements, phishing attempts, and malwares, which are harmful for normal users. Existing URL blacklist approaches offer limited protection. Although recent machine learning based URL classification approaches demonstrate good accuracy and reasonable throughput, they are based on observations from existing spam URLs and hard to detect new spam URLs when attackers employ new strategies.In this paper, we present CUD (Crowdsourcing for URL spam detection) as a supplement of existing detection tools. CUD leverages human intelligence for URL classification through crowdsourcing. CUD crawls existing user comments about spam URLs already on the Internet, and employs sentiment analysis from nature language processing to analyze the user comments automatically for detecting spam URLs. Since CUD does not using features directly associated with the URLs and their landing pages, it is more robust when attackers change their strategies. Through evaluation, we find up to 70% of URLs have user comments online. CUD achieves an accuracy of 86.8% in terms of true positive rate with a false positive rate 0.9%. Moreover, about 75% of spam URLs CUD detects are missed by other approaches. Therefore, CUD can be used as a good complement to other approaches.
What problem does this paper attempt to address?