Abstract:In the era of big data, the data in many business scenarios are characterized by a small number of labelled samples and a large number of unlabelled samples. It is quite difficult to classify and identify such data and provide effective decision support for a business. A commonly employed processing method in this kind of data scenario is the disagreement-based semisupervised learning method, i.e., exchanging high-confidence samples among multiple models as pseudolabel samples to improve each model’s classification performance. As such pseudolabel samples inevitably contain label noise, they may interfere with the subsequent model learning and damage the robustness of the ensemble model. To solve this problem, a semisupervised classification algorithm based on noise learning theory and a disagreement cotraining framework is proposed. In this model, first, the probably approximately correct (PAC) estimation theory under label noise conditions is applied, the relationship between the label noise level and model robust estimation in the process of multiround cotraining is discussed, and a disagreement elimination algorithm framework based on multiple-model (feature argument and select (FANS) algorithm and L1 penalized logistics regression (PLR) algorithm) cotraining is constructed based on this theoretical relationship. The experimental results show that the algorithm proposed in this paper gives not only a high-confidence sample set that meets the upper bound constraint of the label noise level but also a robust ensemble model capable of resisting sampling bias. The work performed in this paper provides a new research perspective for semisupervised learning theory based on disagreement.

Semi-Supervised Ensemble Classification Method Based On Near Neighbor And Its Application

Research on Multi-Label Semi-Supervised Learning Algorithm Based on Dual Selection Criteria

Pseudo-Labeling Optimization Based Ensemble Semi-Supervised Soft Sensor in the Process Industry

Deep Learning for Industrial KPI Prediction: when Ensemble Learning Meets Semi-Supervised Data

A Novel Semi-supervised Classification Method Based on Class Certainty of Samples.

A Semisupervised Classification Algorithm Combining Noise Learning Theory and a Disagreement Cotraining Framework.

A Semi-Supervised Stacked Autoencoder Using the Pseudo Label for Classification Tasks

A novel semi-supervised hyperspectral image classification approach based on spatial neighborhood information and classifier combination

Samples selection in semi-supervised classification

Semi-Supervised Learning: Exploiting Unlabeled Data with Symmetrical Distribution and High Confidence

A Robust Semi-Supervised SVM Via Ensemble Learning

Semi-supervised Classifier Ensemble Model for High-Dimensional Data.

A Semi-Supervised Learning Algorithm Via Adaptive Laplacian Graph

A Semi-Supervised Rough Set Model for Classification Based on Active Learning and Co-Training

Semi-Supervised Clustering Algorithm Based on Small Size of Labeled Data

Semisupervised Particle Swarm Optimization for Classification

Exploiting Ensemble Method in Semi-Supervised Learning

A Generic Semi-Supervised Deep Learning-Based Approach for Automated Surface Inspection

Toward Effective Semi-supervised Node Classification with Hybrid Curriculum Pseudo-labeling

Semi-Supervised Sentiment Classification with a Ensemble Strategy

A Novel Semi-Supervised Adaboost Technique Based On Improved Tri-Training