Research on Text Clustering of Micro-Blog Public Opinion:Word Sense Cluster and Collocation-Based Method

Hengjing Wang,Cungen Cao,Shang Gao
DOI: https://doi.org/10.3969/j.issn.1001-4616.2015.01.009
2015-01-01
Abstract:Micro-blog is the new internet information exchange platform emerged recently,which has the features of theme dispersion,short volume,stylistic freedom,and it can have a huge impact on society. So the information supervision de-partment and commercial enterprise have urgent demand for public opinion analysis based on micro-blog information. This paper presents a novel collocation-based method for text clustering. This method conducts micro-blog text prepro-cessing firstly,and then uses word sense clustering model to extract effective collocation automatically,and effective collo-cation-based text clustering finally. Experiments proved that the efficiency of the text clustering method using word sense cluster is higher than traditional text clustering method by 6.3%,and the method of this paper has higher rate than the text clustering method using word sense cluster by 16.8%. The result shows the validity of our method.
What problem does this paper attempt to address?