TCUAP: A Novel Approach of Text Clustering Using Asymmetric Proximity.

Shaoxu Song,Chunping Li
2005-01-01
Abstract:Text documents have sparse data spaces and current existing methods of text clustering use symmetry proximity to measure the correlation of documents. In this paper, we propose a novel approach to strengthen the discriminative feature of document objects, which uses asymmetric proximity for text clustering. We present a measure of asymmetric proximity between documents and between clusters. TCUAP is an agglomerative hierarchical clustering algorithm and carries on the clustering analysis by strong components of sparse matrix. The experimental evaluation on textual data sets demonstrates the validity and efficiency of our approach. The result shows that the measure of asymmetric proximity possesses higher accuracy than that of symmetry proximity.
What problem does this paper attempt to address?