Text Clustering Based on Feature Space

Jian-yu HUANG,Ai-wu ZHOU,Yun XIAO,Tian-cheng TAN
DOI: https://doi.org/10.3969/j.issn.1673-629X.2017.09.016
2017-01-01
Abstract:Text clustering is a specific application of the clustering algorithm. With the development of Internet,the text clustering has got-ten an increasingly wide utilization in many fields,such as information retrieval and intelligent search engine. Text clustering algorithm in-volves text preprocessing and text clustering primarily,so some improvements on text clustering from these two aspects have been conduc-ted. The traditional text clustering adopts the VSM without considering the semantic similarity and correlation between words,which leads to low accuracy. In view of it,the text clustering method based on feature space is proposed which constructs an alternative word library through the feature space of document collection and gets the document theme according to the alternative word library,and then replaces the words in document based on the themes and its corresponding domain dictionary. However the traditional text clustering algorithm must need artificial K value. Therefore, K-means algorithm is presented based on the K value optimization. The experimental results show that the two improvements above mentioned have made text clustering more intelligent and more precise.
What problem does this paper attempt to address?