Abstract:Large amounts of high-dimensional unlabeled data typically contain only a small portion of truly effective information. Consequently, the issue of unsupervised feature selection methods has gained significant attention in research. However, current unsupervised feature selection approaches face limitations when dealing with datasets that exhibit uneven density, and they also require substantial computational time. To address this problem, this research article proposes a feature extraction technique that combines the Fuzzy C-Means (FCM) and $k$ -nearest neighbor rough sets. FCM is a clustering algorithm grounded in fuzzy theory, which takes into account the inherent data structure and the correlations between different features. Consequently, FCM is particularly well-suited for datasets with uneven density. Our proposed method consists of three steps. First, the FCM algorithm is used to cluster the unlabeled data. Second, a measure that evaluates the importance of features is defined and sorted based on the clustering results. Finally, redundant features are filtered using $k$ -nearest neighbor rough sets while retaining important features, significantly reducing the running time. In addition, we designed the feature selection algorithm (KND-UFS) and conducted experiments on 12 public datasets. We compared KND-UFS with eight existing algorithms in terms of running time, classification accuracy, and the number of selected features. The experimental results provided strong evidence supporting the superior performance of the KND-UFS algorithm.

A Novel Unsupervised Feature Selection for High-Dimensional Data Based on FCM and $k$ -Nearest Neighbor Rough Sets

$$\Hbox {u}^2\hbox {f}^2\hbox {S}^2$$ U 2 F 2 S 2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection.

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

An Unsupervised Feature Selection Method Based on Improved ReliefF and Bisecting K-means

Unsupervised Feature Selection with Ordinal Locality.

A New Unsupervised Feature Selection Algorithm Using Similarity-Based Feature Clustering.

Rethinking Embedded Unsupervised Feature Selection: A Simple Joint Approach

K-means Derived Unsupervised Feature Selection using Improved ADMM

Feature Selection for Unbalanced Distribution Hybrid Data Based on ${K}$-Nearest Neighborhood Rough Set

Unsupervised feature selection for multi-cluster data

Discriminatively embedded fuzzy K-Means clustering with feature selection strategy

KNCFS: Feature selection for high-dimensional datasets based on improved random multi-subspace learning

Unsupervised Feature Selection via Nonnegative Orthogonal Constrained Regularized Minimization

Dependence Guided Unsupervised Feature Selection

Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking

Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection

Unsupervised feature selection via dual space-based low redundancy scores and extended OLSDA

Feature Selection Approach Based on Improved Fuzzy C-Means with Principle of Refined Justifiable Granularity

Feature Selection Based on Data Clustering

Simultaneous local clustering and unsupervised feature selection via strong space constraint

Clustering-based feature subset selection with analysis on the redundancy–complementarity dimension