Abstract:Existing feature selection methods easily neglect the distribution of data, and require most of the neighborhood radius in neighborhood rough sets (NRS) to be selected artificially. These limitations result in the misclassification of samples. To address these drawbacks, this paper presents a mixed measure-based feature selection method using the Fisher score and an NRS model. First, the variation coefficient of the features in different decision classes is defined to depict the dispersion degree of different features, based on which, the neighborhood class is described to develop a novel NRS model. The concepts of dependency degree, neighborhood knowledge granularity, and average neighborhood entropy are defined, and then a mixed measure combining the information and algebra views is proposed to measure the uncertainty in neighborhood decision systems. Second, the average correlation degree of the feature subset is computed to assess the redundancy of the reduced feature subset. By combining the classification accuracy of the selected features, the reduction rate of the classification result, and the average correlation degree of the reduced feature set, we can construct an adaptive neighborhood radius function to avoid the artificial selection of the optimal neighborhood radius. Then, an optimal feature subset can be obtained according to the internal and external significance of the features. Third, the variation coefficient of the samples in different decision classes in each feature is defined to compute the dispersion degree of the samples, and the average of all samples in each feature is added to the between-class scatter to eliminate the effect of the different measurement dimensions of the features; then, the Fisher score model is improved to eliminate the noise of the high-dimensional data. Finally, a heuristic feature selection algorithm with the Fisher score based on the new NRS model is designed to select an optimal feature subset. Experimental results applied to five low-dimensional UCI datasets and nine high-dimensional gene expression datasets showed that the developed algorithm is effective and can select an optimal reduced subset with high classification accuracy when compared with some of the latest algorithms.

Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy

Feature selection considering feature relevance, redundancy and interactivity for neighborhood decision systems

Semi-supervised Minimum Redundancy Maximum Relevance Feature Selection for Audio Classification

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

MVMR-FS : Non-parametric feature selection algorithm based on Maximum inter-class Variation and Minimum Redundancy

Feature Selection with Missing Labels Using Multilabel Fuzzy Neighborhood Rough Sets and Maximum Relevance Minimum Redundancy

Semi-supervised feature selection based on discernibility matrix and mutual information

Locality Sensitive Semi-Supervised Feature Selection

A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty

Multi-label feature selection based on fuzzy neighborhood rough sets

A Constrained Feature Selection Approach Based on Feature Clustering and Hypothesis Margin Maximization

Hybrid Multilabel Feature Selection Using BPSO and Neighborhood Rough Sets for Multilabel Neighborhood Decision Systems

Mixed Measure-Based Feature Selection Using the Fisher Score and Neighborhood Rough Sets

Feature Selection for Monotonic Classification

Feature Selection for Unbalanced Distribution Hybrid Data Based on ${K}$-Nearest Neighborhood Rough Set

Feature Selection with Integrated Relevance and Redundancy Optimization

A New Method for Redundancy Analysis in Feature Selection

A novel semi-supervised hyperspectral image classification approach based on spatial neighborhood information and classifier combination

Efficient Semi-Supervised Feature Selection with Noise Insensitive Trace Ratio Criterion

LSFSR: Local label correlation-based sparse multilabel feature selection with feature redundancy

Joint local structure preservation and redundancy minimization for unsupervised feature selection