Semi-supervised feature selection based on discernibility matrix and mutual information

Qian, Wenbin
DOI: https://doi.org/10.1007/s10489-024-05481-3
IF: 5.3
2024-06-03
Applied Intelligence
Abstract:Feature selection is a vital technique for reducing data dimensionality. While many granular computing-based feature selection algorithms have been proposed, most have been regarded as a supervised learning task requiring a large number of labeled instances. However, obtaining sufficient labeled data is expensive and time-consuming. To address this limitation, a novel semi-supervised feature selection framework is developed by leveraging both labeled and unlabeled data. Specifically, the discernibility matrix is used to measure feature relevance on the labeled data. Moreover, mutual information is employed to evaluate the feature significance on the unlabeled data. By combining these supervised and unsupervised metrics, a greedy feature selection algorithm is proposed for the semi-supervised learning scenarios. The proposed discernibility matrix and mutual information-based feature measurement can select more discriminative features to improve the generalization performance of learning model. Finally, experiments conducted on ten UCI semi-supervised datasets demonstrate that the proposed approach achieves superior performance over five state-of-the-art semi-supervised feature selection methods.
computer science, artificial intelligence
What problem does this paper attempt to address?