Identification of micro-blog advertising publisher based on clustering analysis

Xingyu ZHAO,Zhihong ZHAO,Yepei WANG,Songyu CHEN
DOI: https://doi.org/10.11772/j.issn.1001-9081.2017102478
2018-01-01
Abstract:There is a large amount of advertising content in micro-blog space,which seriously affects user experience and related research work.Much of existing research on micro-blog process uses classification algorithm such as Support Vector Machine (SVM) and random forest algorithm.However,it is difficult to classify a large volume of data in the classification method manually.A micro-blog advertisement publisher identification method based on clustering analysis was proposed.For user dimension,a concept of core micro-blog was put forward to deal with the phenomenon that ordinary micro-blogs were posted to dilute advertising content.Then the extracted main themes of each user and corresponding micro-blog sequences could be used to calculate user characteristics as well as the text characteristics.After that,a clustering algorithm was used to cluster the features and identify the micro-blog advertisers.The experiment result shows that the precision is 93%,the recall is 97%,and the F value is 95%,which proves that the proposed method can accurately identify the micro-blog advertisement publisher under the condition that the content of the advertisement is artificially diluted.It provides theoretical support and practical methods for the recognition and cleaning work of micro-blog spam information.
What problem does this paper attempt to address?