A Novel Hybrid Clustering Algorithm for Topic Detection on Chinese Microblogging

Xiao Geng,Yanmei Zhang,Yuhang Jiao,Yinan Mei
DOI: https://doi.org/10.1109/tcss.2019.2897641
2017-01-01
AIP Conference Proceedings
Abstract:The hot topics discussed on microblogs mirror public opinion, so the topic detection on microblogs is of great significance for the detection and management of public opinion. However, it is difficult for traditional clustering algorithms to handle the large-scale microblogging data with various topics and high noise. Therefore, we propose a three-layer hybrid algorithm to tackle this problem. In the first layer, we use the K-means algorithm, in which the initial center selection optimized to group the microblog texts efficiently. We then subdivide big clusters and isolate noise text to get purer clusters. In the second layer, we adopt the agglomerative nesting (AGNES) algorithm to merge the small clusters referring to the same topic. Then, we exclude most noise, reducing their further impact on the K-means in the third layer which corrects the erroneous merging occurring in AGNES. Experiments show that our algorithm outperforms some related traditional algorithms on the clustering of real microblogging data set and performs well in the topic detection.
What problem does this paper attempt to address?