Unsupervised Spectral Feature Selection Algorithms for High Dimensional Data

Mingzhao Wang,Henry Han,Zhao Huang,Juanying Xie
DOI: https://doi.org/10.1007/s11704-022-2135-0
IF: 2.6688
2023-01-01
Frontiers of Computer Science
Abstract:It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data, especially for those with very small number of samples. Feature selection especially the unsupervised ones are the right way to deal with this challenge and realize the task. Therefore, two unsupervised spectral feature selection algorithms are proposed in this paper. They group features using advanced Self-Tuning spectral clustering algorithm based on local standard deviation, so as to detect the global optimal feature clusters as far as possible. Then two feature ranking techniques, including cosine-similarity-based feature ranking and entropy-based feature ranking, are proposed, so that the representative feature of each cluster can be detected to comprise the feature subset on which the explainable classification system will be built. The effectiveness of the proposed algorithms is tested on high dimensional benchmark omics datasets and compared to peer methods, and the statistical test are conducted to determine whether or not the proposed spectral feature selection algorithms are significantly different from those of the peer methods. The extensive experiments demonstrate the proposed unsupervised spectral feature selection algorithms outperform the peer ones in comparison, especially the one based on cosine similarity feature ranking technique. The statistical test results show that the entropy feature ranking based spectral feature selection algorithm performs best. The detected features demonstrate strong discriminative capabilities in downstream classifiers for omics data, such that the AI system built on them would be reliable and explainable. It is especially significant in building transparent and trustworthy medical diagnostic systems from an interpretable AI perspective.
What problem does this paper attempt to address?