A New Partitioning Based Algorithm for Document Clustering.

Zonghu Wang,Zhijing Liu,Donghui Chen,Kai Tang
DOI: https://doi.org/10.1109/fskd.2011.6019857
2011-01-01
Abstract:Document clustering is one of the key problems in text mining and information retrieval area. It groups text documents in a way that maximizes the similarity within clusters and minimizes the similarity between different clusters. Most partitioning based algorithms are sensitive to the initial centroids, the clustering result greatly depends on the initial centroids. This paper first uses unsupervised feature selection method to reduce the dimension of document feature space and then proposes a novel partitioning based algorithm which select initial cluster centriods in the process of clustering by the size and density of cluster in the datasets. The experiments on several text datasets show that the proposed approach effectively improves the quality of clustering.
What problem does this paper attempt to address?