Document Clustering Method Based on Frequent Co-occurring Words.

Ye-Hang Zhu,Guan-Zhong Dai,Benjamin C. M. Fung,De-Jun Mu
2006-01-01
Abstract:This paper presents a new document clustering method based. on frequent co-occurring words. We first employ the Singular Value Decomposition, and then group the words into clusters called word representatives as substitution of the corresponding words in the original documents. Next, we extract the frequent word representative sets by Apriori. Subsequently, each document is designated to a basic unit described by the frequent word representative set, from which we can get the ultimate clusters by hierarchical clustering. The major advantage of our method is that it can produce. the cluster description by the frequent word representatives and then by the corresponding words in the clustering process without any extra works. Compared, with the state-of-the-art UPGMA method on benchmark datasets, our method has better performance in terms of the entropy and cluster purity.
What problem does this paper attempt to address?