High-Efficiency Text Clustering Algorithm Based on Semantic Distance

Feng Shao-rong,Xiao Wen-jun
DOI: https://doi.org/10.3321/j.issn:1000-565X.2008.05.006
2008-01-01
Abstract:As the existing text clustering algorithms overlook the semantic information between words and possess low calculation accuracy of text similarity,this paper proposes a new text clustering algorithm based on the semantic distance.In this method,the text is analyzed in terms of semantic,and the specific semantic of the text is used to calculate the similarity.Moreover,the nearest neighbor clustering algorithm is adopted,and a second clustering algorithm is presented to overcome the sensitivity of the nearest neighbor clustering algorithm to the input order of the text.According to the similarity weight,some feature words representing the cluster are chosen,which makes the remained feature words similar to the themes of the cluster.Experimental results indicate that the proposed algorithm is of higher clustering precision and recall rate,as compared with the k-Means algorithm based on the vector space model.
What problem does this paper attempt to address?