Short text clustering with expanding keywords through concept graph

Huang Xiaohui,Ye Yunming,Du Xiaolin,Deng Shengchun
DOI: https://doi.org/10.12733/jcis7371
2013-01-01
Journal of Computational Information Systems
Abstract:In most of text clustering algorithm, the feature vector used to represent a document is obtained by calculating the feature value (e. g. tfidf) of each term in only this document. However, these types of methods perform poorly to short text clustering since short text contains only a few of terms (i. e. words). In this paper, we propose an expanding keyword approach based on concept graph to address the insufficiency of the keywords in short text clustering. In the process of extracting the keywords using concept graph, there are two key problems: (1) how to construct the concept graph; (2) how to extract the groups of keywords from the concept graph. For constructing concept graph, the concepts are used as the vertexes and the edges exist between the concepts which have common terms. And then, the hierarchical clustering algorithm is exploited to partition the concept graph to extract the groups of keywords. The proposed approach is evaluated on a number of data sets and the promising experimental results showed that our approach is superior to the methods which do not expand keywords and expand keywords with Wikipedia in terms of precision, recall and fscore. © 2013 Binary Information Press.
What problem does this paper attempt to address?