A token-based online web-snippet clustering approach based on directed probability graph

Tianfang Yao,Jianchao Li
2009-01-01
Journal of Computational Information Systems
Abstract:Online Web-snippet clustering is one of the challenging tasks on document clustering. It can handle the situation of too many records returned by current search engines at a time, especially if a query from users involves multiple subtopics. In this paper, first of all we propose a novel approach by using tokens as fundamental units for Web-snippet clustering, which can avoid segmentation problem of oriental languages and also can be suitable for any other languages. Then we put forward the Directed Probability Graph (DPG) model used for recognizing expressive phrases as cluster labels without requiring any external knowledge. There is no need for calculating the similarity between pair-wise documents. The experimental results have shown that our clustering algorithm is very efficient and can be effectively applied to online Web-snippet clustering. Moreover, this approach can be applied to the preprocessing of Web pages for opinion mining. 1553-9105/ Copyright © 2009 Binary Information Press.
What problem does this paper attempt to address?