Star: semi-supervised tripartite attribute reduction
Keyu Liu,Damo Qian,Tianrui Li,Xibei Yang,Tengyu Yin,Xin Yang,Dun Liu
DOI: https://doi.org/10.1007/s13042-024-02472-1
2024-12-09
International Journal of Machine Learning and Cybernetics
Abstract:Attribute reduction, also known as feature selection is favored in preprocessing data especially high-dimensional data for learning models. In general, conventional feature selection algorithms including supervised and unsupervised ones yield impressive performances depending on completely labeled and unlabeled data. Nevertheless, they fail for partially labeled data where a very limited portion is annotated with labels. In this work, we therefore propose a novel scheme dubbed as Semi-supervised Tripartite Attribute Reduction (STAR) to handle the paucity of label information. Essentially, STAR is a three-phase framework: (1) divide original partially labeled data into three parts including augmented labeled data, pseudo labeled data and updated unlabeled data; (2) devise three types of fuzzy measurements including fuzzy-rough dependency-based index, fuzzy joint entropy and fuzzy inter-cluster distance to eliminate feature quality on these three parts; (3) derive the qualified feature subset that can maximize the feature importance fusing the three evaluation criteria. STAR is validated in extensive experiments as compared with other nine well-established feature selection algorithms including one unsupervised, two supervised, and six semi-supervised methods. The reported results demonstrate that base classifiers fed by features selected from STAR are more accurate, suggesting its superiority in the presence of partially labeled data.
computer science, artificial intelligence