Abstract:It is usually expected that learning performance can be improved by exploiting unlabeled data, particularly when the number of labeled data is limited. However, it has been reported that, in some cases existing semi-supervised learning approaches perform even worse than supervised ones which only use labeled data. For this reason, it is desirable to develop safe semi-supervised learning approaches that will not significantly reduce learning performance when unlabeled data are used. This paper focuses on improving the safeness of semi-supervised support vector machines (S3VMs). First, the S3VM-us approach is proposed. It employs a conservative strategy and uses only the unlabeled instances that are very likely to be helpful, while avoiding the use of highly risky ones. This approach improves safeness but its performance improvement using unlabeled data is often much smaller than S3VMs. In order to develop a safe and well-performing approach, we examine the fundamental assumption of S3VMs, i.e., low-density separation. Based on the observation that multiple good candidate low-density separators may be identified from training data, safe semi-supervised support vector machines (S4VMs) are here proposed. This approach uses multiple low-density separators to approximate the ground-truth decision boundary and maximizes the improvement in performance of inductive SVMs for any candidate separator. Under the assumption employed by S3VMs, it is here shown that S4VMs are provably safe and that the performance improvement using unlabeled data can be maximized. An out-of-sample extension of S4VMs is also presented. This extension allows S4VMs to make predictions on unseen instances. Our empirical study on a broad range of data shows that the overall performance of S4VMs is highly competitive with S3VMs, whereas in contrast to S3VMs which hurt performance significantly in many cases, S4VMs rarely perform worse than inductive SVMs.

Towards Safe Semi-Supervised Learning for Multivariate Performance Measures

Category-Level Regularized Unlabeled-to-Labeled Learning for Semi-supervised Prostate Segmentation with Multi-site Unlabeled Data

Learning Safe Prediction for Semi-Supervised Regression

Towards Safe Weakly Supervised Learning

Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance

S4VM: Safe Semi-Supervised Support Vector Machine

Towards Making Unlabeled Data Never Hurt

Meta-Semi: A Meta-learning Approach for Semi-supervised Learning.

Towards Automated Semi-Supervised Learning

Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling

Semi-Supervised Empirical Risk Minimization: Using unlabeled data to improve prediction

Robust Deep Semi-Supervised Learning: A Brief Introduction

Safe semi-supervised learning: a brief introduction

Efficient Estimation and Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling

Not All Parameters Should Be Treated Equally: Deep Safe Semi-supervised Learning under Class Distribution Mismatch

Do not trust what you trust: Miscalibration in Semi-supervised Learning

Reliable Weakly Supervised Learning: Maximize Gain and Maintain Safeness

A Benchmark on Robust Semi-Supervised Learning in Open Environments

SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning

Learning Safe Multi-Label Prediction for Weakly Labeled Data

SemiReward: A General Reward Model for Semi-supervised Learning