Label Distribution-based Open-world Semi-supervised Learning

Qinkai Yang,Chao Tan,Junzhao Hao,Genlin Ji
DOI: https://doi.org/10.1109/cbd63341.2023.00041
2023-01-01
Abstract:In traditional supervised learning, each training data consists of both data and labels. However, in most cases, it is challenging to obtain a large number of labeled data, and labeling data requires significant prior knowledge, which incurs substantial costs. Semi-supervised learning is an effective approach to address the issue of having only a small portion of labeled data in a dataset, aiming to construct a better classifier by leveraging the unlabeled data. In this work, we tackle a more realistic SSL problem known as open-world SSL. The main objective of this work is to identify samples of known classes in the unlabeled data while simultaneously classifying samples of novel classes that exist in the unlabeled data. Due to the unknown distribution of seen and unseen class samples, there is often a problem of focusing too much on seen classes, resulting in neglecting the unseen classes. Here, we propose a new method to solve the open-world SSL problem. It leverages the sample information of seen classes to obtain reliable label distributions and then utilizes these label distributions to guide the model in solving the issue of imbalanced distributions between seen and unseen class samples during the learning process. Extensive experiments demonstrate the effectiveness of our approach on multiple benchmark datasets, outperforming existing mainstream methods on five different datasets.
What problem does this paper attempt to address?