A Phrase-Based Method For Hierarchical Clustering Of Web Snippets

Zhao Li,Xindong Wu
DOI: https://doi.org/10.1609/aaai.v24i1.7773
2010-01-01
Abstract:Document clustering has been applied in web information retrieval, which facilitates users' quick browsing by organizing retrieved results into different groups. Meanwhile, a tree-like hierarchical structure is well-suited for organizing the retrieved results in favor of web users. In this regard, we introduce a new method for hierarchical clustering of web snippets by exploiting a phrase-based document index. In our method, a hierarchy of web snippets is built based on phrases instead of all snippets, and the snippets are then assigned to the corresponding clusters consisting of phrases. We show that, as opposed to the traditional hierarchical clustering, our method not only presents meaningful cluster labels but also improves clustering performance.
What problem does this paper attempt to address?