Abstract:Deep neural networks has been highly successful in data-intense computer vision applications, while such success relies heavily on the massive and clean data. In real-world scenarios, clean data sometimes is difficult to obtain. For example, in image classification and segmentation tasks, precise annotations of millions samples are generally very expensive and time-consuming. In 3D static scene reconstruction task, most NeRF related methods require the foundational assumption of the static scene (e.g. consistent lighting condition and persistent object positions), which is often violated in real-world scenarios. To address these problem, learning with noisy ground truth (LNGT) has emerged as an effective learning method and shows great potential. In this short survey, we propose a formal definition unify the analysis of LNGT LNGT in the context of different machine learning tasks (classification and regression). Based on this definition, we propose a novel taxonomy to classify the existing work according to the error decomposition with the fundamental definition of machine learning. Further, we provide in-depth analysis on memorization effect and insightful discussion about potential future research opportunities from 2D classification to 3D reconstruction, in the hope of providing guidance to follow-up research.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is **how to effectively train deep neural networks (DNNs) when performing machine learning on datasets with noisy labels, in order to improve the generalization ability and prediction accuracy of the model**. Specifically, the paper focuses on two main aspects: 1. **The problem of noisy labels in 2D classification tasks**: - In the real world, it is very difficult and expensive to obtain large - scale and accurately labeled datasets. For example, in image classification and segmentation tasks, the cost of accurately labeling millions of samples is extremely high. - When using a dataset with noisy labels for training, DNNs tend to over - fit these wrong labels, resulting in a decline in test performance. 2. **The noise problem in 3D scene reconstruction tasks**: - Traditional 3D scene reconstruction methods based on NeRF (Neural Radiance Fields) and others assume that the scene is static and the lighting conditions are consistent. However, these assumptions are often violated in actual scenes, resulting in a significant decline in reconstruction quality. - In the case of the existence of distractors in multi - view images, how to effectively perform 3D scene reconstruction is a challenge. ### Solutions To address the above problems, the paper proposes the following solutions: - **Formal definition and classification**: The paper gives a formal definition of "Learning with Noisy Ground Truth (LNGT)" and proposes a new taxonomy to analyze existing LNGT work based on error decomposition. - **Utilization of the memory effect**: By analyzing the memory effect of DNNs on noisy labels during the training process, some improvement methods are proposed. For example, in 2D classification tasks, by introducing a weighted entropy term to minimize the prediction entropy, the over - fitting of the model to noisy labels can be reduced. - **Noise handling in 3D reconstruction**: For 3D scene reconstruction, the paper proposes a similar method to handle noisy labels. For example, by generating a dynamic weight mask to distinguish pure pixels from interfering pixels, thereby suppressing the influence of interfering pixels during the training process. ### Formula summary In 2D classification tasks, the cross - entropy loss function is expressed as: \[ L_{ce}=-\frac{1}{N}\sum_{i = 1}^{N}(\bar{\mathbf{y}}_i)^\top\log(\mathbf{p}_i) \] where $\bar{\mathbf{y}}_i$ is the observed noisy label, and $\mathbf{p}_i = S(\mathbf{z}_i)$ is the probability distribution after the softmax function. In 3D reconstruction tasks, the new mask NeRF loss function is expressed as: \[ L_{mask - nerf}=\sum_{r}M_r\|\hat{I}_r - I_r\|^2_2 \] where $M_r$ is the dynamic weight mask, $\hat{I}_r$ is the rendered color, and $I_r$ is the real color. Through these methods, the paper aims to provide a systematic framework to deal with the problem of noisy labels from 2D classification to 3D reconstruction.

Learning with Noisy Ground Truth: From 2D Classification to 3D Reconstruction

3D Reconstruction From Traditional Methods to Deep Learning

Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

Survey on Fundamental Deep Learning 3D Reconstruction Techniques

RegGeoNet: Learning Regular Representations for Large-Scale 3D Point Clouds

Learning Signed Distance Functions from Noisy 3D Point Clouds via Noise to Noise Mapping

Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning

From Chaos to Clarity: 3DGS in the Dark

A review on deep learning techniques for 3D sensed data classification

3D Reconstruction Using Deep Learning: a Survey.

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Advancing 3D Object Grounding Beyond a Single 3D Scene

Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision

Deep Learning Based 3D Segmentation: A Survey

Learning without Forgetting for 3D Point Cloud Objects

Learning with Noisy Class Labels for Instance Segmentation

Learning Reliable Gradients from Undersampled Circular Light Field for 3D Reconstruction.

Learned feature embeddings for non-line-of-sight imaging and recognition

ULD-Net: 3D unsupervised learning by dense similarity learning with equivariant-crop

Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper

Learning by Restoring Broken 3D Geometry