Abstract:Abstract Machine learning (ML) is an approach driven by data, and as research in machine learning progresses, the issue of noisy labels has garnered widespread attention. Noisy labels can significantly reduce the accuracy of supervised classification models, making it important to address this problem. Therefore, it is a very meaningful task to detect as many noisy labels as possible from the big data. In this study, a new method is proposed for detecting noisy labels in datasets. This method leverages a deep pre-trained network to extract a feature set from the image data first which can extract more accurate data features. Then, a membership degree based on tightness into the support vector data description (SVDD) model named TF-SVDD is introduced to detect noisy data in the dataset. In order to simulate different types of label noise more accurately, we first assumed that the labels of the datasets used were all correct, and in addition constructed the noise set using two method: the density peak noise set and the random noise set. Experimental results demonstrate that the TF-SVDD can effectively detect noisy label data, surpassing traditional support vector data description algorithms and other methods in terms of outlier detection accuracy, with the average accuracy mostly exceeding 50 $$\%$$ % , and even reaching 80 $$\%$$ % . Furthermore, one novel measure called ‘confidence’ is employed to rectify noisy labels in the data. Following the correction of noisy labels, the accuracy of image classification experiences a significant improvement, with the average promotion ratio mostly exceeding 10 $$\%$$ % , and reaching 30 $$\%$$ % .

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address the issue of noisy labels in machine learning. Specifically, noisy labels can significantly reduce the accuracy of supervised classification models, so detecting and correcting these noisy labels is crucial for improving model performance. The paper proposes a Tightness-based Fuzzy Support Vector Data Description (TF-SVDD) method for detecting noisy labels from large datasets. ### Main Contributions 1. **Construction of Initial Noise Set**: - Using the traditional density peak clustering algorithm to construct the initial noise set. 2. **Tightness-based Fuzzy SVDD Model**: - Introducing a new method to more accurately distinguish noisy samples through a tightness-based fuzzy SVDD model. 3. **New Confidence Metric**: - Proposing a new confidence metric to correct noisy labels. ### Method Overview 1. **Feature Extraction**: - Using a pre-trained ResNet-18 network to extract features from image data. 2. **Generation of Initial Noise Set**: - Constructing the noise set using two methods: random selection and density peak algorithm. 3. **Fuzzy Membership Function**: - Designing a tightness-based fuzzy membership function that considers the distance between samples and class centers as well as the compactness of intra-class samples. 4. **TF-SVDD Model**: - Integrating the fuzzy membership function into the SVDD model and optimizing the objective function to detect noisy labels. 5. **Noise Label Correction**: - Using the confidence metric to correct detected noisy labels and evaluating classification accuracy through SVM. ### Experimental Results The paper conducted experiments on three color image datasets (cats and dogs, fruits, utensils) with 20%, 40%, and 60% random noise and density noise added. The experimental results show that the TF-SVDD method outperforms traditional SVDD and other methods in both noisy label detection and classification accuracy. ### Conclusion The TF-SVDD method proposed in this study can effectively detect and correct noisy labels, significantly improving the accuracy of image classification. By introducing a tightness-based fuzzy membership function and confidence metric, this method excels in handling the noisy label problem.

The fuzzy support vector data description based on tightness for noisy label detection

Fuzzy Support Vector Regression Based on Data Domain Description

FGCM: Noisy Label Learning via Fine-Grained Confidence Modeling

Privacy preserving and fast decision for novelty detection using support vector data description

SV-Learner: Support-Vector Contrastive Learning for Robust Learning with Noisy Labels

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Noisy Label Processing for Classification: A Survey

A class sensitivity feature guided T-type generative model for noisy label classification

Uncertainty-guided label correction with wavelet-transformed discriminative representation enhancement

DAT: Training Deep Networks Robust to Label-Noise by Matching the Feature Distributions

Robust Long-Tailed Learning under Label Noise

Intuitionistic fuzzy least squares MLTSVM for noisy label data using label-specific features and local label correlation

Learning with Feature-Dependent Label Noise: A Progressive Approach

Learning With Non-Uniform Label Noise: A Cluster-Dependent Weakly Supervised Approach.

Robust Image Classification with Noisy Labels by Negative Learning and Feature Space Renormalization

An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for Noisy Labels

Boundary‐based Fuzzy‐SVDD for one‐class classification

Reliable Label Correction is a Good Booster When Learning with Extremely Noisy Labels.

On Better Detecting and Leveraging Noisy Samples for Learning with Severe Label Noise

Learning to Detect Noisy Labels Using Model-Based Features

A Benchmark of Long-tailed Instance Segmentation with Noisy Labels