Abstract:In text classification field, many classifiers cannot deal with the features with large dimensions, thus it is very important to filter the redundant information from the original feature space efficiently and achieve the features with best qualities. On this basis, a new two-step based feature selection method is proposed in this paper. Firstly, some definitions (word semantic correlation, set semantic correlation, semantic correlative and semantic correlative set) are introduced, and the algorithm of generating the semantic correlative sets is given. Secondly, the process of the two-step based feature selection method is described: in the first step, a feature subset is obtained by using an optimal feature selection method, and a set of semantic correlative sets is generated by using the selected feature subset; in the second step, the redundant information of the selected features is filtered by using the generated semantic correlative sets. Finally, in order to avoid local optimum when searching the best threshold, the conception of memory recall position is introduced and an improved memory recall mechanism based fruit fly optimization algorithm is proposed. In the experiments, two typical classifiers: support vector machine and naïve bayes are used on four datasets: Reuters50, SMSSPAS, WebKB and 20-Newsgroups, and the 10-cross validation is carried out when the measurements of F 1 and receiver operating curve are used. Experimental results show that the proposed method achieves higher accuracy than several representative traditional feature selection methods and runs faster than typical mutual information based feature selection methods, illustrating its effectiveness on achieving the best features in text classification filed.

An Effective Feature Selection Method For Text Categorization

CLDA: Feature Selection for Text Categorization Based on Constrained LDA

Improving Short Text Classification Through Better Feature Space Selection

Aggressive Dimensionality Reduction With Reinforcement Local Feature Selection For Text Categorization

A New Approach of Feature Selection for Text Categorization

Feature Selection Method on Imbalanced Text

A comprehensive unsupervised feature selection method of two-stage strategy

Feature Selection for Support Vector Machines in Text Categorization

Relative Term-Frequency Based Feature Selection for Text Categorization

Learning Effective Features for Chinese Text Categorization

Feature selection method based on backward cloud model in text classification

A General Framework of Feature Selection for Text Categorization

Select Strong Information Features to Improve Text Categorization Effectiveness

Dimensionality Reduction With Category Information Fusion And Non-Negative Matrix Factorization For Text Categorization

A High Performance Two-Class Chinese Text Categorization Method

An Efficient Feature Selection Method Using Named Entity Recognition for Chinese Text Categorization

A class-feature-centroid classifier for text categorization

An Approximate Markov Blanket Feature Selection Algorithm

An Empirical Study on Feature Selection Methods for Centroid-based Text Classification

An Evaluation on Feature Selection for Text Clustering

Two-step Based Feature Selection Method for Filtering Redundant Information