Subspace clustering of text documents with feature weighting k-means algorithm

Liping Jing,Michael K. Ng,Jun Xu,Joshua Zhexue Huang
DOI: https://doi.org/10.1007/11430919_94
2005-01-01
Abstract:This paper presents a new method to solve the problem of clustering large and complex text data. The method is based on a new subspace clustering algorithm that automatically calculates the feature weights in the k-means clustering process. In clustering sparse text data the feature weights are used to discover clusters from subspaces of the document vector space and identify key words that represent the semantics of the clusters. We present a modification of the published algorithm to solve the sparsity problem that occurs in text clustering. Experimental results on real-world text data have shown that the new method outperformed the Standard KMeans and Bisection-KMeans algorithms, while still maintaining efficiency of the k-means clustering process.
What problem does this paper attempt to address?