A Wikipedia-Based Semantic Model For Text Clustering

Jing-Min Zhou,Qing-Jun Cui,Hui Zhang
2011-01-01
Abstract:Taking the advantages of the veracity and comprehensiveness of Wikipedia information, we mined semantic knowledge from Wikipedia abstracts and introduced a Wikipedia-based semantic model for text clustering. In this model, the words or phrases that are closely related in Wikipedia abstracts are gathered to semantic groups, which we define as "semantic clusters" in this paper. The proposed semantic model also contains a Semantic-rank algorithm used to compute the significance of the words or phrases in a semantic cluster. Inspired by the phenomenon that the source charge exerts electric force to the victim charge, we introduced a new concept called "semantic attractive force" between a semantic cluster and a document. We applied the formula of semantic attractive force to the process of text clustering and ultimately complete the semantic text clustering based on Wikipedia. Experimental results demonstrate that compared with the traditional keyword-based text clustering, the newly developed semantic model enhances the clustering quality of both clustering and cluster labels.
What problem does this paper attempt to address?