Algorithm of text feature selection based on vocabulary attribute clustering

Qun Zhang,Hongjun Wang,Lunwen Wang
DOI: https://doi.org/10.3969/j.issn.1001-3695.2017.02.011
2017-01-01
Abstract:Effective text feature selection is the precondition of text mining.Conventional text feature selection method has limited effect on dimension of eigenvector reduction and text representation.Besides,conventional text feature selection method is not suitable for unsupervised text clustering.In view of above,this paper proposed a novel algorithm of text feature selection based on the concept of vocabulary attribute suitable for text clustering.Firstly,the algorithm constructed the model based on vocabulary attribute including term frequency,document frequency,term position and term correlation.Then it analyzed the approach to calculate attribute value in detail and improved Apriori algorithm to calculate attribute value of term correlation.Finally it clustered on the vocabulary attribute model by the improved K-means clustering algorithm to complete the text feature selection.Experimental results show that this proposed scheme can effectively reduce the dimension of eigenvector and improve the text representation capability of feature vocabulary compared to the traditional methods,and meets the actual demand for text clustering.
What problem does this paper attempt to address?