Abstract:Machine learning models experience deteriorated performance when trained in the presence of noisy labels. This is particularly problematic for medical tasks, such as survival prediction, which typically face high label noise complexity with few clear-cut solutions. Inspired by the large fluctuations across folds in the cross-validation performance of survival analyses, we design Monte-Carlo experiments to show that such fluctuation could be caused by label noise. We propose two novel and straightforward label noise detection algorithms that effectively identify noisy examples by pinpointing the samples that more frequently contribute to inferior cross-validation results. We first introduce Repeated Cross-Validation (ReCoV), a parameter-free label noise detection algorithm that is robust to model choice. We further develop fastReCoV, a less robust but more tractable and efficient variant of ReCoV suitable for deep learning applications. Through extensive experiments, we show that ReCoV and fastReCoV achieve state-of-the-art label noise detection performance in a wide range of modalities, models and tasks, including survival analysis, which has yet to be addressed in the literature. Our code and data are publicly available at <a class="link-external link-https" href="https://github.com/GJiananChen/ReCoV" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of performance degradation when machine - learning models are trained in the presence of label noise, especially in medical tasks. Label noise refers to inaccurate or incorrect labels of some samples in the dataset, which is very common in real - world datasets, especially in the medical field. Specifically, the paper focuses on how to detect and identify samples with noisy labels. Label noise is particularly complex in medical tasks, such as survival prediction. Such tasks usually face higher label - noise complexity and lack clear solutions. By observing the performance fluctuations between different folds during cross - validation, the author proposes that these fluctuations may be caused by label noise and further designs experiments to verify this hypothesis. ### The methods proposed in the paper To solve the above problems, the paper proposes two novel and straightforward label - noise - detection algorithms: 1. **Repeated Cross - Validation (ReCoV)**: This is a parameter - free, model - independent label - noise - detection algorithm. It can identify samples that frequently lead to poor validation results through multiple repeated cross - validations, thereby determining that these samples may have noisy labels. 2. **fastReCoV**: This is a more efficient but slightly less robust variant of ReCoV, suitable for deep - learning applications. It improves computational efficiency by introducing techniques such as weighted sampling and exponential moving average. ### Experimental verification Through extensive experiments, the author shows the excellent performance of ReCoV and fastReCoV in multiple modalities, models, and tasks, especially in survival analysis, which has not been fully explored in previous literature. ### Conclusion The main contributions of the paper are: - Discovering that performance fluctuations in different folds during cross - validation can reflect the existence of label noise. - Proposing two effective label - noise - detection algorithms, ReCoV and fastReCoV, which can achieve state - of - the - art performance in multiple tasks. - Demonstrating the effectiveness and practicality of these methods in dealing with real - world label noise, especially on medical - image datasets. In summary, this paper provides a new perspective and tool to deal with the label - noise problem, which helps to improve the performance and reliability of machine - learning models in the presence of label noise.

Cross-Validation Is All You Need: A Statistical Approach To Label Noise Estimation

An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for Noisy Labels

Iterative Cross Learning on Noisy Labels

Improving Medical Images Classification with Label Noise Using Dual-Uncertainty Estimation

Robustness and Reliability When Training With Noisy Labels

Co-Correcting: Noise-Tolerant Medical Image Classification via Mutual Label Correction

Overview of model validation for survival regression model with competing risks using melanoma study data

Analyze the Robustness of Classifiers under Label Noise

Deep learning with noisy labels in medical prediction problems: a scoping review

Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis

L2B: Learning to Bootstrap Robust Models for Combating Label Noise

Bayesian statistics guided label refurbishment mechanism: Mitigating label noise in medical image classification

Label-noise-tolerant medical image classification via self-attention and self-supervised learning

Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise

Is K-fold cross validation the best model selection method for Machine Learning?

Limited Gradient Descent: Learning With Noisy Labels

Cross-validation in high-dimensional spaces: a lifeline for least-squares models and multi-class LDA

NoiseRank: Unsupervised Label Noise Reduction with Dependence Models

Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach

Contrastive Learning Joint Regularization for Pathological Image Classification with Noisy Labels

Empirical investigation of multi-source cross-validation in clinical ECG classification