Feature extension of cluster analysis based on Microblog

Xulei DUAN,Yangsen ZHANG,Zhengbin GUO
DOI: https://doi.org/10.3778/j.issn.1002-8331.1606-0438
2017-01-01
Abstract:Microblog has become the soil of information generated and spread today. But the information in the Microblog is different from the news Web page or blog information. In the Microblog, these characteristics, which the texts are high-dimensional and sparse, bring great challenges to the Microblog text processing. According to the characteristics of Micro-blog, this paper compares the methods that the expansion strategy of short text based on HowNet and Cilin, it proposes that using Word2vec to train the corpus of Microblog, and constructs a related vocabulary words of the Microblog context, then uses the seed words and Microblog label information to expand Microblog text, and puts forward the methods of extracting Microblog text keywords and distinguishing the similar words and related words. Finally, the experiments show that by using the Word2vec to extend Microblog is better, and the effect of cluster analysis for Microblog text has been significantly improved.
What problem does this paper attempt to address?