Spam comments detection with self-extensible dictionary and text-based features

Qiang Zhang,Chenwei Liu,Shangru Zhong,Kai Lei
DOI: https://doi.org/10.1109/ISCC.2017.8024692
2017-01-01
Abstract:The new social media have become popular for information spreading, allowing online users to publish latest events and personal opinions. However, massive spam comments seriously decrease users' reading experience. To detect spam comments in Chinese social media, we employ semantic analysis to build the self-extensible dictionary which updates and extends itself with new cyber words automatically. The Semantic analysis brings extra semantic features which helps in text classification. Based on the statistical analysis of microblogging comments, we select four text-based features, which basically represent characteristics of Chinese spam comments. We use spam dictionary and text-based features to construct classifiers for detecting spam comments. Finally, we achieve an average detection accuracy of 93.6%, which is preferable to existing spam comments detection methods. Experimental results demonstrate that our method can effectively detect spam comments in Chinese microblogging field.
What problem does this paper attempt to address?