Word Distributed Representation Based Text Clustering.

Shan Feng,Ruifang Liu,Qinlong Wang,Ruisheng Shi
DOI: https://doi.org/10.1109/ccis.2014.7175766
2014-01-01
Abstract:The fast growth of Internet web documents has posed new challenges on how to efficiently and accurately manage and retrieve the textual collections, text clustering plays a significant role. Traditional document clustering is an unsupervised categorization of a given document collection based on vector space model, which is a high sparse vector. In this paper, we propose a means to fight the existing shortcomings with a word vector in distributed representation which is obtained from a neural probabilistic language model. To improve the representation of document vector and enhance the accuracy of text clustering, we first computing semantic similarities between words using word embedded vector, and then expanding the keywords of each document. The experiment results show the method can improve the accuracy of clustering.
What problem does this paper attempt to address?