A On-Line News Documents Clustering Method

Hui Zhang,Guo-hui Li,Xin-wen Xu
DOI: https://doi.org/10.1007/978-3-642-35236-2_9
2012-01-01
Abstract:To improve the efficiency and accuracy of on-line news event detection (ONED) method, we select the words that their term frequency (TF) is greater than a threshold to create the vector space model of the news document, and propose a two-stage clustering method for ONED. This method divides the detection process into two stages. In the first stage, the similar documents collected in a certain period of time are clustered into micro-clusters. In the second stage, the micro-clusters are compared with previous event clusters. The experimental results show that the proposed method has fewer computation load, higher computing rate, and less loss of accuracy.
What problem does this paper attempt to address?