Web Clustering Based on Tag Set Similarity.

Jingli Zhou,Xuejun Nie,Leihua Qin,Jianfeng Zhu
DOI: https://doi.org/10.4304/jcp.6.1.59-66
2011-01-01
Abstract:Tagging is a service that allows users to associate a set of freely determined tags with web content. Clustering web documents with tag sets can eliminate the time-consuming preprocess of word stemming. This paper proposes a novel method to compute the similarity between tag sets and use it as the distance measure to cluster web documents into groups. Major steps in this method include computing a tag similarity matrix with set-based vector space model, smoothing the similarity matrix to obtain a set of linearly independent vectors and compute the tag set similarity based on these vectors. The experimental results show that the proposed tag set similarity measures surpasses other common similarity measures not only in the reliable derivation of clustering results, but also in clustering accuracies and efficiencies.
What problem does this paper attempt to address?