Semi-supervised attribute reduction for partially labelled multiset-valued data via a prediction label strategy

Zhaowen Li,Taoli Yang,Jinjin Li
DOI: https://doi.org/10.1016/j.ins.2023.03.127
IF: 8.1
2023-07-01
Information Sciences
Abstract:The existence of a large amount of partially labelled data is due to the high cost of labelled data. For this type of data, traditional rough set models cannot well represent the distribution of objects in real data. This limits effective decision-making and classification of the data. Replacing the missing labels with prediction labels can compensate for the absence of labels to some extent. Then, multiset-valued data with prediction labels is considered. This paper studies semi-supervised attribute reduction for partially labelled multiset-valued data via a prediction label strategy. First, the distance between information values in a multiset-valued decision information system (MSVDIS) is constructed, and the tolerance relation on the object set of an MSVDIS is introduced. Then, a partially labelled multiset-valued decision information system (p-MSVDIS) is defined. Next, a prediction label strategy (i.e., the existing labels in a p-MSVDIS remain the same, and the missing labels are replaced with the prediction label) is proposed. Incidentally, a new MSVDIS is obtained, and the object set is reclassified by the decision attribute. Moreover, the dependence and conditional entropy in the obtained MSVDIS are proposed, and some properties are discussed. Semi-supervised attribute reduction algorithms in a p-MSVDIS via dependence and conditional entropy are designed. Finally, experiments on real datasets show that the prediction label strategy outperforms the traditional rough set approach and that the designed algorithms have better classification and outlier detection performance than the existing algorithms.
computer science, information systems
What problem does this paper attempt to address?