A Novel Unsupervised Feature Selection for High-Dimensional Data Based on FCM and $k$ -Nearest Neighbor Rough Sets

Weihua Xu,Yang Zhang,Yuhua Qian
DOI: https://doi.org/10.1109/tnnls.2024.3460796
IF: 14.255
2024-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Large amounts of high-dimensional unlabeled data typically contain only a small portion of truly effective information. Consequently, the issue of unsupervised feature selection methods has gained significant attention in research. However, current unsupervised feature selection approaches face limitations when dealing with datasets that exhibit uneven density, and they also require substantial computational time. To address this problem, this research article proposes a feature extraction technique that combines the Fuzzy C-Means (FCM) and $k$ -nearest neighbor rough sets. FCM is a clustering algorithm grounded in fuzzy theory, which takes into account the inherent data structure and the correlations between different features. Consequently, FCM is particularly well-suited for datasets with uneven density. Our proposed method consists of three steps. First, the FCM algorithm is used to cluster the unlabeled data. Second, a measure that evaluates the importance of features is defined and sorted based on the clustering results. Finally, redundant features are filtered using $k$ -nearest neighbor rough sets while retaining important features, significantly reducing the running time. In addition, we designed the feature selection algorithm (KND-UFS) and conducted experiments on 12 public datasets. We compared KND-UFS with eight existing algorithms in terms of running time, classification accuracy, and the number of selected features. The experimental results provided strong evidence supporting the superior performance of the KND-UFS algorithm.
What problem does this paper attempt to address?