Abstract:Monocular 3D object detection (Mono 3Det) aims to identify 3D objects from a single RGB image. However, existing methods often assume training and test data follow the same distribution, which may not hold in real-world test scenarios. To address the out-of-distribution (OOD) problems, we explore a new adaptation paradigm for Mono 3Det, termed Fully Test-time Adaptation. It aims to adapt a well-trained model to unlabeled test data by handling potential data distribution shifts at test time without access to training data and test labels. However, applying this paradigm in Mono 3Det poses significant challenges due to OOD test data causing a remarkable decline in object detection scores. This decline conflicts with the pre-defined score thresholds of existing detection methods, leading to severe object omissions (i.e., rare positive detections and many false negatives). Consequently, the limited positive detection and plenty of noisy predictions cause test-time adaptation to fail in Mono 3Det. To handle this problem, we propose a novel Monocular Test-Time Adaptation (MonoTTA) method, based on two new strategies. 1) Reliability-driven adaptation: we empirically find that high-score objects are still reliable and the optimization of high-score objects can enhance confidence across all detections. Thus, we devise a self-adaptive strategy to identify reliable objects for model adaptation, which discovers potential objects and alleviates omissions. 2) Noise-guard adaptation: since high-score objects may be scarce, we develop a negative regularization term to exploit the numerous low-score objects via negative learning, preventing overfitting to noise and trivial solutions. Experimental results show that MonoTTA brings significant performance gains for Mono 3Det models in OOD test scenarios, approximately 190% gains by average on KITTI and 198% gains on nuScenes.

What problem does this paper attempt to address?

The paper primarily addresses the issue of Out-of-Distribution (OOD) data in the practical application of Monocular 3D Object Detection (Mono 3Det). Specifically, existing Mono 3Det methods typically assume that the training data and test data follow the same distribution. However, in real-world application scenarios, this assumption often does not hold, leading to a significant decline in model performance. To solve the aforementioned problem, the authors propose a new adaptation paradigm called Fully Test-Time Adaptation (Fully TTA). This paradigm aims to adapt a pre-trained model to unlabeled test data by handling potential data distribution shifts, without accessing the training data and test labels. However, applying this paradigm to monocular 3D object detection faces significant challenges because OOD test data can cause a substantial drop in detection scores. This conflicts with the predefined score thresholds of existing detection methods, resulting in severe missed detection issues (i.e., rare positive detections and numerous false negatives). To address this challenge, the authors propose a method called Monocular Test-Time Adaptation (MonoTTA), which is based on two new strategies: 1. **Reliability-Driven Adaptation**: Through the study of high-scoring objects, it is found that they remain reliable, and optimizing these high-scoring objects can enhance the confidence of all detections. Therefore, this strategy designs an adaptive method to identify reliable high-scoring objects for model adaptation, to discover potential objects and mitigate the omission problem. 2. **Noise-Protected Adaptation**: Since high-scoring objects may be scarce, this strategy develops a negative term to utilize the numerous low-scoring objects for negative learning, preventing the model from overfitting to noise and generating meaningless solutions. Experimental results show that MonoTTA can significantly improve the performance of monocular 3D object detection models in OOD test scenarios. For example, on the KITTI dataset, the average performance improvement is about 190%, and on the nuScenes dataset, the average improvement is about 198%.

Fully Test-Time Adaptation for Monocular 3D Object Detection

MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes

Weakly Supervised Test-Time Domain Adaptation for Object Detection

DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments

Better Regression Makes Better Test-time Adaptive 3D Object Detection

MLFA: Toward Realistic Test Time Adaptive Object Detection by Multi-Level Feature Alignment

Unsupervised Domain Adaptation for Monocular 3D Object Detection Via Self-training

STFAR: Improving Object Detection Robustness at Test-Time by Self-Training with Feature Alignment Regularization

MonoSAID: Monocular 3D Object Detection Based on Scene-Level Adaptive Instance Depth Estimation

MonoAux: Fully Exploiting Auxiliary Information and Uncertainty for Monocular 3D Object Detection

Test-time Adaptation in the Dynamic World with Compound Domain Knowledge Management

MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

Single Image Test-Time Adaptation for Segmentation

Test-Time Adaptation of 3D Point Clouds via Denoising Diffusion Models

Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation

MonoMM: A Multi-scale Mamba-Enhanced Network for Real-time Monocular 3D Object Detection

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Universal Test-time Adaptation through Weight Ensembling, Diversity Weighting, and Prior Correction

Augment and Criticize: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection