Text Clustering Based on Asymmetric Similarity

宋韶旭,李春平
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2006.07.037
2006-01-01
Abstract:Text clustering data sets have sparse data spaces,with existing text clustering methods using distance-based dissimilarity to measure the document similarity.The document discrimination ability can be strengthened by a asymmetric similarity approach for text clustering.The asymmetric similarity is measured by a clustering analysis of the strong components of the sparse matrix.The approach provides a conceptual structure after the hierarchical clustering.Tests on textual data sets show that the asymmetric similarity measure provides higher precision with less run time than the distance-based dissimilarity method.With small numbers of clusters,the accuracy is improved by about 20%.
What problem does this paper attempt to address?