Annotation-aware web clustering based on topic model and random walks

Jiashen Sun,Xiaojie Wang,Caixia Yuan,Guannan Fang
DOI: https://doi.org/10.1109/CCIS.2011.6045023
2011-01-01
Abstract:Web page clustering based on semantic or topic promises improved search and browsing on the web. Intuitively, tags from social bookmarking websites such as del.icio.us can be used as a complementary source to document thus improving clustering of web pages. In this paper, we present a novel model which employs topic model to associate annotated document with a distribution of topics, and then constructs a graph including tags, document and topics by performing a Random Walks for clustering. We examine the performance of our model on a real-world data set, illustrating that our model provides improved clustering performance than algorithm utilizing page text alone.
What problem does this paper attempt to address?