Abstract:While collecting training data, even with the manual verification of experts from crowdsourcing platforms, eliminating incorrect annotations (noisy labels) completely is difficult and expensive. In dealing with datasets that contain noisy labels, over-parameterized deep neural networks (DNNs) tend to overfit, leading to poor generalization and classification performance. As a result, noisy label learning (NLL) has received significant attention in recent years. Existing research shows that although DNNs eventually fit all training data, they first prioritize fitting clean samples, then gradually overfit to noisy samples. Mainstream methods utilize this characteristic to divide training data but face two issues: class imbalance in the segmented data subsets and the optimization conflict between unsupervised contrastive representation learning and supervised learning. To address these issues, we propose a Balanced Partitioning and Training framework with Pseudo-Label Relaxed contrastive loss called BPT-PLR, which includes two crucial processes: a balanced partitioning process with a two-dimensional Gaussian mixture model (BP-GMM) and a semi-supervised oversampling training process with a pseudo-label relaxed contrastive loss (SSO-PLR). The former utilizes both semantic feature information and model prediction results to identify noisy labels, introducing a balancing strategy to maintain class balance in the divided subsets as much as possible. The latter adopts the latest pseudo-label relaxed contrastive loss to replace unsupervised contrastive loss, reducing optimization conflicts between semi-supervised and unsupervised contrastive losses to improve performance. We validate the effectiveness of BPT-PLR on four benchmark datasets in the NLL field: CIFAR-10/100, Animal-10N, and Clothing1M. Extensive experiments comparing with state-of-the-art methods demonstrate that BPT-PLR can achieve optimal or near-optimal performance.

Imbalanced Multiple Noisy Labeling

Imbalanced Multiple Noisy Labeling for Supervised Learning.

A Threshold Method for Imbalanced Multiple Noisy Labeling

Active Learning with Imbalanced Multiple Noisy Labeling

Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Pseudo Labels for Imbalanced Multi-Label Learning

Online Multi-Label Classification under Noisy and Changing Label Distribution

BPT-PLR: A Balanced Partitioning and Training Framework with Pseudo-Label Relaxed Contrastive Loss for Noisy Label Learning

Majority Voting and Pairing with Multiple Noisy Labeling.

PLM: Partial Label Masking for Imbalanced Multi-label Classification

Learning With Noisy Labels Over Imbalanced Subpopulations

Consensus Algorithms for Biased Labeling in Crowdsourcing.

DIRECT: Deep Active Learning under Imbalance and Label Noise

Uncertainty-Aware Learning against Label Noise on Imbalanced Datasets

Towards Imbalanced Large Scale Multi-label Classification with Partially Annotated Labels

Rebalancing Multi-Label Class-Incremental Learning

ALIM: Adjusting Label Importance Mechanism for Noisy Partial Label Learning

Learning Image Labels On-the-fly for Training Robust Classification Models

An Online Learning Approach to Improving the Quality of Crowd-Sourcing

When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration Method