Unsupervised Web Topic Detection Using A Ranked Clustering-Like Pattern Across Similarity Cascades
Fei Jia,Junbiao Pang,Weigang Zhang,Guorong Li,Chunjie Zhang,Qingming Huang,Yugui Liu
DOI: https://doi.org/10.1109/tmm.2015.2425143
IF: 7.3
2015-01-01
IEEE Transactions on Multimedia
Abstract:Despite the massive growth of social media on the Internet, the process of organizing, understanding, and monitoring user generated content (UGC) has become one of the most pressing problems in today's society. Discovering topics on the web from a huge volume of UGC is one of the promising approaches to achieve this goal. Compared with classical topic detection and tracking in news articles, identifying topics on the web is by no means easy due to the noisy, sparse, and less- constrained data on the Internet. In this paper, we investigate methods from the perspective of similarity diffusion, and propose a clustering-like pattern across similarity cascades (SCs). SCs are a series of subgraphs generated by truncating a similarity graph with a set of thresholds, and then maximal cliques are used to capture topics. Finally, a topic-restricted similarity diffusion process is proposed to efficiently identify real topics from a large number of candidates. Experiments demonstrate that our approach outperforms the state-of-the-art methods on three public data sets.