ISTC: A New Method for Clustering Search Results

Wei Zhang,Baowen Xu,Weifeng Zhang,Junling Xu
DOI: https://doi.org/10.1007/s11859-008-0424-6
2008-01-01
Wuhan University Journal of Natural Sciences
Abstract:A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the independence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.
What problem does this paper attempt to address?