Abstract:In text classification field, many classifiers cannot deal with the features with large dimensions, thus it is very important to filter the redundant information from the original feature space efficiently and achieve the features with best qualities. On this basis, a new two-step based feature selection method is proposed in this paper. Firstly, some definitions (word semantic correlation, set semantic correlation, semantic correlative and semantic correlative set) are introduced, and the algorithm of generating the semantic correlative sets is given. Secondly, the process of the two-step based feature selection method is described: in the first step, a feature subset is obtained by using an optimal feature selection method, and a set of semantic correlative sets is generated by using the selected feature subset; in the second step, the redundant information of the selected features is filtered by using the generated semantic correlative sets. Finally, in order to avoid local optimum when searching the best threshold, the conception of memory recall position is introduced and an improved memory recall mechanism based fruit fly optimization algorithm is proposed. In the experiments, two typical classifiers: support vector machine and naïve bayes are used on four datasets: Reuters50, SMSSPAS, WebKB and 20-Newsgroups, and the 10-cross validation is carried out when the measurements of F 1 and receiver operating curve are used. Experimental results show that the proposed method achieves higher accuracy than several representative traditional feature selection methods and runs faster than typical mutual information based feature selection methods, illustrating its effectiveness on achieving the best features in text classification filed.

Text Feature Selection Based on Class Subspace

CLDA: Feature Selection for Text Categorization Based on Constrained LDA

Improving Short Text Classification Through Better Feature Space Selection

A New Feature Selection Method for Text Classification Based on Independent Feature Space Search

Aggressive Dimensionality Reduction With Reinforcement Local Feature Selection For Text Categorization

A Feature Selection Method Based on Class Feature Domains for Text Categorization

Efficient Method for Feature Selection in Text Classification

Feature reduction methods for text classification

Feature selection method based on category discriminability

A Method of Feature Selection Based on Word2Vec in Text Categorization

An Effective Feature Selection Method For Text Categorization

New Feature Selection Approach(cdf) for Text Categorization

A Discriminative and Semantic Feature Selection Method for Text Categorization

A New Feature Selection Method for Handling Redundant Information in Text Classification

Two-step Based Feature Selection Method for Filtering Redundant Information

A New Approach To Feature Selection For Text Categorization

Subspace Sparse Discriminative Feature Selection

Feature Selection Method Based on Multiple Centrifuge Models.

Feature Selection for Support Vector Machines in Text Categorization

Review of Feature Dimension Reduction in Text Classification

Subspace Learning for Unsupervised Feature Selection Via Matrix Factorization.