Text clustering based on term weights automatic partition

Yu Yonghong,Bai Wenyang
DOI: https://doi.org/10.1109/ICCAE.2010.5451390
2010-01-01
Abstract:Text clustering is becoming more and more popular due to the increasing of texts on Web and the requirements in real application. This paper introduces a novel automatic text clustering method, in which the genetic algorithm is first applied to the global optimal and high searching efficient term selection to achieve dimensionality reduction, and then appropriate number of partitions of document set are created according to the different combinations of term weights, and each document partition is clustered into an initial clusters based on dynamic programming technique, and last all initial clusters are clustered using the same method to final text clusters. It also provides analysis and theorem proof that the algorithm can provide higher performance in computational complexity, clustering effect and high dimensional data clustering.
What problem does this paper attempt to address?