A No-Word-Segmentation Hierarchical Clustering Approach to Chinese Web Search Results.

Hui Zhang,Liping Zhao,Rui Liu,Deqing Wang
DOI: https://doi.org/10.1007/978-3-540-68636-1_66
2008-01-01
Abstract:In this paper, we present a No-Word-Segmentation Hierarchical Clustering Approach (NWSHCA) to Chinese Web search results. The approach uses a new similarity measure between two documents based on a variation of the Edit Distance, and then it generates preliminary clusters using a partitioning clustering method. Next it ranks all common substring in a cluster using a cluster-discriminative metric with the top K as cluster description labels. Finally it uses HAC to cluster the top K cluster labels to form a navigational tree. NWSHCA can generate overlapping clusters contrast to most clustering algorithms. Experimental results show that the approach is feasible and effective.
What problem does this paper attempt to address?