An Improved K-Means Algorithm for Documents Clustering

万小军,杨建武,陈晓鸥
DOI: https://doi.org/10.3969/j.issn.1000-3428.2003.02.043
2003-01-01
Abstract:This paper first introduces the partitioning-based k-means algorithms for documents clustering. The k-means algorithm adapts to processing the vast amount of documents, but it is sensitive to outliers. So this paper puts forward an idea to separate the clustering centroid from the clustering seed and brings forward an algorithm based on this idea to improve the k-means algorithm. The paper shows the results of the experiments to prove that this algorithm is more veracious and stable than the k-means algorighm.
What problem does this paper attempt to address?