Dynamic detection of spammers in Weibo

Zhang Cheng,Niu Kai,He Zhiqiang
DOI: https://doi.org/10.1109/ICNIDC.2014.7000276
2014-01-01
Abstract:Social networks have developed to maturity up to now. And Weibo submitted its initial public offerings (IPO) to U.S. securities and exchange commission (SEC) on March 15th 2014. However, spamming has been a long existing problem on the Internet, it has existed at the time of Web 1.0. With fast development of social networks, spamming problem emerged and became more complex in social network service. Spammers post feeds containing typical phrases of a trending topic and URLs that usually are uncorrelated with feeds content. These URLs will lead users to certain websites that usually are Taobao shop sites, so spammers can earn money. It is an urgent task to construct mechanisms to automaticly detect and stop spammers. Researches about this have been done. But because of the game between spammers and antispam systems, behavior parttens of spammers changes constantly. In this paper we present an adaptive framwork to detecting spammers dynamiclly by using incremental learning of machine learning algorithm. To access to the identification of spammers we analyze Weibo user behaviors systematically, and find different behavior parttens between spammers and legitimate users. To collect a large set of Weibo user samples, we apply for a high privilege developer account and devise an effective method using Weibo open platform. We collected a large dataset of Sina Weibo which includes 30 million users and 46 million feeds and 980 million links. By checking users' tweeting behaviors, we gathered training user samples including spammers and legitimate users manually. And then we compared characteristics of user social behaviors of spammers with legitimate users. These characteristics were used in our framwork to devide spammers from legitimate users. Through tests with real data it is proved that this approach can effectively identify the spammers in Weibo.
What problem does this paper attempt to address?