Topic Detection Based on Group Average Hierarchical Clustering.

Ni Gao,Ling Gao,Yiyue He,Hai Wang,Qian Sun
DOI: https://doi.org/10.1109/cbd.2013.38
2013-01-01
Abstract:Via analyzing characters of vast disaster news on the internet, a new topic detection algorithm based on Group Average Hierarchical Clustering (GAHC), which is suitable for the processing of big data on the network, is proposed in this paper. The core idea of GAHC is to divide big data into smaller groups, and then cluster groups hierarchically to generate final topics. During the process of clustering, vector space modal is used to represent news documents, and a similarity calculation model based on weights of time and place is proposed. The new algorithm can automatically organize similar disaster news materials, generate news topics, furthermore provide personalized service for users and form the topic detection system for disaster news. Experimental results demonstrate that the performance of the algorithm is good.
What problem does this paper attempt to address?