Abstract:Sparse regression based feature selection method has been extensively investigated these years. However, because it has a non-convex constraint, i.e., $\ell _{2,0}$ℓ2,0-norm constraint, this problem is very hard to solve. In this paper, unlike most of the other methods which only solve its slack version by introducing sparsity regularization into objective function forcibly, a novel framework is proposed by us to solve the original $\ell _{2,0}$ℓ2,0-norm constrained sparse regression based feature selection problem. We transform our objective function into Linear Discriminant Analysis (LDA) by using a new label coding method, thus enabling our model to calculate the ratio of inter-class scatter to intra-class scatter of features which is the most widely used feature discrimination evaluation metric. According to that ratio, features can be selected by a simple sorting method. The projection gradient descent algorithm is introduced to further improve the performance of our algorithm by using the solution obtained before as its initial solution. This ensures the stability of this iterative algorithm. We prove that the proposed method can get the global optimal solution of this non-convex problem when all features are statistically independent. For the general case where features are statistically dependent, extensive experiments on six small sample size datasets and one large-scale dataset show that our algorithm has comparable or better classification capability comparing with other eight state-of-the-art feature selection methods by the SVM classifier. We also show that our algorithm can obtain a low loss value, which means the solution of our algorithm can get very close to this NP-hard problem’s real solution. What is more, because we solve the original $\ell _{2,0}$ℓ2,0-norm constrained problem, we avoid the heavy work of tuning the regularization parameter because its meaning is explicit in our method, i.e., the number of selected features. At last, we evaluate the stability of our algorithm from two perspectives, i.e., the objective function values and the selected features, by experiments. From both perspectives, our algorithm shows satisfactory stability performance.

Feature selection based on non-negative spectral feature learning and adaptive rank constraint

Unsupervised Feature Selection Using Nonnegative Spectral Analysis.

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

Unsupervised Feature Selection via Nonnegative Orthogonal Constrained Regularized Minimization

Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection

Regularized orthogonal forward feature selection for spectral data

Double feature selection algorithm based on low-rank sparse non-negative matrix factorization

Unsupervised feature selection via discrete spectral clustering and feature weights

Unsupervised feature selection via dual space-based low redundancy scores and extended OLSDA

Simultaneous local clustering and unsupervised feature selection via strong space constraint

Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection

Joint Adaptive Graph Learning and Discriminative Analysis for Unsupervised Feature Selection

Unsupervised Feature Selection Via Joint Local Learning and Group Sparse Regression

Unsupervised feature selection by non-convex regularized self-representation

Unsupervised feature selection for multi-cluster data

Non-negative multi-label feature selection with dynamic graph constraints

Non-convex feature selection based on feature correlation representation and dual manifold optimization

Double-Structured Sparsity Guided Flexible Embedding Learning for Unsupervised Feature Selection

Clustering-based Hyperspectral Band Selection Using Sparse Nonnegative Matrix Factorization

Principal Component Analysis With Fuzzy Elastic Net for Feature Selection