Abstract:Semi-supervised learning (SSL) methods effectively leverage unlabeled data to improve model generalization. However, SSL models often underperform in open-set scenarios, where unlabeled data contain outliers from novel categories that do not appear in the labeled set. In this paper, we study the challenging and realistic open-set SSL setting, where the goal is to both correctly classify inliers and to detect outliers. Intuitively, the inlier classifier should be trained on inlier data only. However, we find that inlier classification performance can be largely improved by incorporating high-confidence pseudo-labeled data, regardless of whether they are inliers or outliers. Also, we propose to utilize non-linear transformations to separate the features used for inlier classification and outlier detection in the multi-task learning framework, preventing adverse effects between them. Additionally, we introduce pseudo-negative mining, which further boosts outlier detection performance. The three ingredients lead to what we call Simple but Strong Baseline (SSB) for open-set SSL. In experiments, SSB greatly improves both inlier classification and outlier detection performance, outperforming existing methods by a large margin. Our code will be released at <a class="link-external link-https" href="https://github.com/YUE-FAN/SSB" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to simultaneously improve the in - class classification performance and anomaly detection performance of the model in the open - set semi - supervised learning (open - set SSL) scenario. Specifically: 1. **Limitations of Standard SSL**: Traditional semi - supervised learning (SSL) methods assume that unlabeled data come from the same class distribution as labeled data when processing unlabeled data. However, in an open - set scenario, unlabeled data may contain outlier samples from new classes, which leads to poor performance of the model on the open - set. 2. **Challenges in Open - set SSL**: In open - set SSL, the goal is not only to correctly classify inlier samples of known classes but also to be able to detect outlier samples of unknown classes. Existing methods usually train classifiers by filtering out abnormal data, but this will lead to two main problems: - Filtering out many useful in - class samples, reducing the utilization rate of unlabeled data, thus affecting the classification performance. - There is mutual interference between the internal classifier and the anomaly detector of the shared feature encoder, resulting in a decline in detection performance. To solve these problems, the author proposes a simple but powerful baseline method (Simple but Strong Baseline, SSB), which mainly includes the following three improvements: 1. **Pseudo - label - enhanced Classifier**: Different from detector - based filtering methods, SSB proposes to incorporate high - confidence pseudo - labels (whether they are abnormal data or not) into training to improve the utilization rate of unlabeled data and use useful abnormal data as natural enhancement data for in - classes. 2. **Non - linear Feature Separation**: In order to reduce the mutual interference between the internal classifier and the anomaly detector, SSB introduces a non - linear transformation layer (MLP projection head) to pass the output of the shared feature encoder to the classifier and the detector respectively, thereby achieving feature separation and improving the performance of the two types of tasks. 3. **Pseudo - negative Sample Mining**: By selecting in - class samples with lower confidence as pseudo - abnormal samples, the diversity of training data for the anomaly detector is enhanced, further improving the anomaly detection performance. Through these improvements, SSB significantly improves the performance of in - class classification and anomaly detection in experiments, surpassing existing methods. ### Summary of Mathematical Formulas - **Pseudo - label Classification Loss**: \[ L_{u}^{\text{cls}}(X_u)=\frac{1}{B_u}\sum_{i = 1}^{B_u}1(\max\hat{p}_u^i\geq\tau)H(\hat{p}_u^i,\hat{y}_u^i) \] where \(\hat{p}_u^i = \text{softmax}(h_c(g_c(f(x_u^i))))\), \(\hat{y}_u^i=\arg\max\hat{p}_u^i\), \(H(\cdot,\cdot)\) represents the cross - entropy loss, and \(\tau\) is a predefined confidence threshold. - **Total Classification Loss**: \[ L_{\text{cls}}(X_l,X_u)=L_l^{\text{cls}}(X_l)+L_u^{\text{cls}}(X_u) \] where \(L_l^{\text{cls}}(X_l)\) is the standard cross - entropy loss of labeled data. - **Anomaly Detection Loss**: \[ L_{l}^{\text{det}}(X_l)=-\frac{1}{B_l}\sum_{i = 1}^{B_l}\left[\log(p_{y_i}(x_l^i))+\frac{1}{K}\sum_{k\neq y_i}\log(1 - p_k(x_l^i))\right] \] where \(p_k(x_l^i)\) is the in - class score of the \(k\)-th class, and \(K = |C|- 1\). - **Pseudo - negative Sample Mining Loss**:

SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning

IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization

An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised Learning

Exploration and Exploitation of Unlabeled Data for Open-Set Semi-Supervised Learning

OCI-SSL: Open Class-Imbalanced Semi-Supervised Learning with Contrastive Learning

SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

A Benchmark on Robust Semi-Supervised Learning in Open Environments

S2OSC: A Holistic Semi-Supervised Approach for Open Set Classification

Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data

Scalable Graph-Based Semi-Supervised Learning Through Sparse Bayesian Model.

Partial Optimal Transport Based Out-of-Distribution Detection for Open-Set Semi-Supervised Learning

A Viable Framework for Semi-Supervised Learning on Realistic Dataset

LaRW: Boosting Open-Set Semi-Supervised Learning with Label-Guided Re-Weighting

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding

Robust Pseudo-Label Selection for Holistic Semi-Supervised Learning

An Empirical Study and Analysis on Open-Set Semi-Supervised Learning

Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class

Class-Aware Contrastive Semi-Supervised Learning

DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection

Realistic evaluation of deep semi-supervised learning algorithms