Abstract:While collecting training data, even with the manual verification of experts from crowdsourcing platforms, eliminating incorrect annotations (noisy labels) completely is difficult and expensive. In dealing with datasets that contain noisy labels, over-parameterized deep neural networks (DNNs) tend to overfit, leading to poor generalization and classification performance. As a result, noisy label learning (NLL) has received significant attention in recent years. Existing research shows that although DNNs eventually fit all training data, they first prioritize fitting clean samples, then gradually overfit to noisy samples. Mainstream methods utilize this characteristic to divide training data but face two issues: class imbalance in the segmented data subsets and the optimization conflict between unsupervised contrastive representation learning and supervised learning. To address these issues, we propose a Balanced Partitioning and Training framework with Pseudo-Label Relaxed contrastive loss called BPT-PLR, which includes two crucial processes: a balanced partitioning process with a two-dimensional Gaussian mixture model (BP-GMM) and a semi-supervised oversampling training process with a pseudo-label relaxed contrastive loss (SSO-PLR). The former utilizes both semantic feature information and model prediction results to identify noisy labels, introducing a balancing strategy to maintain class balance in the divided subsets as much as possible. The latter adopts the latest pseudo-label relaxed contrastive loss to replace unsupervised contrastive loss, reducing optimization conflicts between semi-supervised and unsupervised contrastive losses to improve performance. We validate the effectiveness of BPT-PLR on four benchmark datasets in the NLL field: CIFAR-10/100, Animal-10N, and Clothing1M. Extensive experiments comparing with state-of-the-art methods demonstrate that BPT-PLR can achieve optimal or near-optimal performance.

PLBR: A Semi-supervised Document Key Information Extraction Via Pseudo-labeling Bias Rectification

CRMSP: A Semi-supervised Approach for Key Information Extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling

Dual Knowledge Distillation on Multiview Pseudo Labels for Unsupervised Person Re-Identification

BPT-PLR: A Balanced Partitioning and Training Framework with Pseudo-Label Relaxed Contrastive Loss for Noisy Label Learning

Semi-Supervised Dual Relation Learning for Multi-Label Classification

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

On Pseudo-Labeling for Class-Mismatch Semi-Supervised Learning

In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

Debiased Pseudo Labeling in Self-Training

Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

Pseudo Label Selection is a Decision Problem

Pseudo Labels for Imbalanced Multi-Label Learning

DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples

IDPL: Intra-subdomain adaptation adversarial learning segmentation method based on Dynamic Pseudo Labels

A Semi-Supervised Stacked Autoencoder Using the Pseudo Label for Classification Tasks

You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling

Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning

Information Redundancy and Biases in Public Document Information Extraction Benchmarks

BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence

Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence