Clustering Algorithm on Block Division of Documents

Gang Liu,Mingyue Luo
DOI: https://doi.org/10.1109/wicom.2010.5600166
2010-01-01
Abstract:In the traditional K-means algorithm, the selection of cluster number and the initial cluster center brings huge affection on the quality of clustering. To reduce the dependence on the initial center and to locate the types of new data rapidly, an algorithm applicable for text data is proposed. In this algorithm, document density is considered as parameter. Documents are divided into blocks first. After that every divided block is clustered separately. Experiment shows that this algorithm not only makes higher quality for clustering, but also does well in the new increasing data.
What problem does this paper attempt to address?