Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics

YANG Wan-xia,SUN Li-he,HUANG Yong-feng
DOI: https://doi.org/10.3969/j.issn.1000-3428.2012.22.042
2012-01-01
Abstract:The primary difficulty of text clustering lies in the multi-dimensional sparseness of texts.A short text clustering algorithm which takes semantic and statistic features into account is proposed.A dimensionality reduction is achieved via the semantic relativity analysis of lexical semantics by semantic dictionary.The second dimension reduction is completed after a feature selection through statistical methods.The short text clustering is obtained with the combination of the two reductions.Experimental result shows that the algorithm has better clustering effect and efficiency on short text.
What problem does this paper attempt to address?