Uncertainty measurement of partially labeled categorical data with application to semi-supervised attribute reduction

Pei Wang,Qinli Zhang,Witold Pedrycz,Zhaowen Li,Ching-Feng Wen
DOI: https://doi.org/10.1007/s10462-023-10518-z
IF: 9.588
2023-01-01
Artificial Intelligence Review
Abstract:In many practical applications of machine learning, there are a large number of partially labeled categorical data due to the high cost of labelling data. Semi-supervised learning algorithm is needed to deal with such data. This paper studies uncertainty measurement (UM) of partially labeled categorical data and considers semi-supervised attribute reduction in a partially labeled categorical decision information system (p-CDIS). The fact that a discernibility pair set for categorical data is actually a distinguishable relation is first stated. Then, a p-CDIS is divided into two categorical decision information systems: one is the labeled categorical decision information system (l-CDIS) and the other is the unlabeled categorical decision information system (u-CDIS). Next, based on the indistinguishable relation, distinguishable relation and dependence function, four degrees of importance are defined. They are the weighted sum of l-CDIS and u-CDIS determined by the label missing rate and can be considered as the UM of p-CDIS. Moreover, the numerical experiments and statistical tests on 10 datasets verify their effectiveness. In addition, an adaptive semi-supervised reduction algorithm based on the defined degrees of importance is proposed, which can automatically adapt to various label missing rates. Finally, the results of experiments and statistical tests on 10 datasets show the proposed algorithm is statistically better than some stat-of-the-art algorithms according to classification accuracy.
What problem does this paper attempt to address?