Fuzzy C-Means Text Clustering Based on Topic Concept Sub-Space

Xianghua Ji,Chao Chen,Zhengrong Shao,Nenghai Yu
DOI: https://doi.org/10.3969/j.issn.1003-7985.2007.03.028
2007-01-01
Abstract:To improve the accuracy of text clustering,fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts.Five evaluation functions are combined to extract key phrases.Concept phrases, as well as the descriptions of final clusters,are presented using WordNet(R) origin from key phrases.Initial centers and membership matrix are the most important factors affecting clustering performance.Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces.The results show that, different from random initialization of traditional fuzzy c-means clustering,the initialization related to text content contributions can improve clustering precision.
What problem does this paper attempt to address?