Learning with Noisy Ground Truth: From 2D Classification to 3D Reconstruction

Yangdi Lu,Wenbo He
2024-06-23
Abstract:Deep neural networks has been highly successful in data-intense computer vision applications, while such success relies heavily on the massive and clean data. In real-world scenarios, clean data sometimes is difficult to obtain. For example, in image classification and segmentation tasks, precise annotations of millions samples are generally very expensive and time-consuming. In 3D static scene reconstruction task, most NeRF related methods require the foundational assumption of the static scene (e.g. consistent lighting condition and persistent object positions), which is often violated in real-world scenarios. To address these problem, learning with noisy ground truth (LNGT) has emerged as an effective learning method and shows great potential. In this short survey, we propose a formal definition unify the analysis of LNGT LNGT in the context of different machine learning tasks (classification and regression). Based on this definition, we propose a novel taxonomy to classify the existing work according to the error decomposition with the fundamental definition of machine learning. Further, we provide in-depth analysis on memorization effect and insightful discussion about potential future research opportunities from 2D classification to 3D reconstruction, in the hope of providing guidance to follow-up research.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is **how to effectively train deep neural networks (DNNs) when performing machine learning on datasets with noisy labels, in order to improve the generalization ability and prediction accuracy of the model**. Specifically, the paper focuses on two main aspects: 1. **The problem of noisy labels in 2D classification tasks**: - In the real world, it is very difficult and expensive to obtain large - scale and accurately labeled datasets. For example, in image classification and segmentation tasks, the cost of accurately labeling millions of samples is extremely high. - When using a dataset with noisy labels for training, DNNs tend to over - fit these wrong labels, resulting in a decline in test performance. 2. **The noise problem in 3D scene reconstruction tasks**: - Traditional 3D scene reconstruction methods based on NeRF (Neural Radiance Fields) and others assume that the scene is static and the lighting conditions are consistent. However, these assumptions are often violated in actual scenes, resulting in a significant decline in reconstruction quality. - In the case of the existence of distractors in multi - view images, how to effectively perform 3D scene reconstruction is a challenge. ### Solutions To address the above problems, the paper proposes the following solutions: - **Formal definition and classification**: The paper gives a formal definition of "Learning with Noisy Ground Truth (LNGT)" and proposes a new taxonomy to analyze existing LNGT work based on error decomposition. - **Utilization of the memory effect**: By analyzing the memory effect of DNNs on noisy labels during the training process, some improvement methods are proposed. For example, in 2D classification tasks, by introducing a weighted entropy term to minimize the prediction entropy, the over - fitting of the model to noisy labels can be reduced. - **Noise handling in 3D reconstruction**: For 3D scene reconstruction, the paper proposes a similar method to handle noisy labels. For example, by generating a dynamic weight mask to distinguish pure pixels from interfering pixels, thereby suppressing the influence of interfering pixels during the training process. ### Formula summary In 2D classification tasks, the cross - entropy loss function is expressed as: \[ L_{ce}=-\frac{1}{N}\sum_{i = 1}^{N}(\bar{\mathbf{y}}_i)^\top\log(\mathbf{p}_i) \] where $\bar{\mathbf{y}}_i$ is the observed noisy label, and $\mathbf{p}_i = S(\mathbf{z}_i)$ is the probability distribution after the softmax function. In 3D reconstruction tasks, the new mask NeRF loss function is expressed as: \[ L_{mask - nerf}=\sum_{r}M_r\|\hat{I}_r - I_r\|^2_2 \] where $M_r$ is the dynamic weight mask, $\hat{I}_r$ is the rendered color, and $I_r$ is the real color. Through these methods, the paper aims to provide a systematic framework to deal with the problem of noisy labels from 2D classification to 3D reconstruction.