Abstract:3D object detection plays a crucial role in numerous intelligent vision systems. Detection in the open world inevitably encounters various adverse scenes, such as dense fog, heavy rain, and low light conditions. Although existing efforts primarily focus on diversifying network architecture or training schemes, resulting in significant progress in 3D object detection, most of these learnable modules fail in adverse scenes, thereby hindering detection performance. To address this issue, this paper proposes a monocular 3D detection model designed to perceive twin depth in adverse scenes, termed MonoTDP, which effectively mitigates the degradation of detection performance in various harsh environments. Specifically, we first introduce an adaptive learning strategy to aid the model in handling uncontrollable weather conditions, significantly resisting degradation caused by various degrading factors. Then, to address the depth/content loss in adverse regions, we propose a novel twin depth perception module that simultaneously estimates scene and object depth, enabling the integration of scene-level features and object-level features. Additionally, we assemble a new adverse 3D object detection dataset encompassing a wide range of challenging scenes, including rainy, foggy, and low light weather conditions, with each type of scene containing 7,481 images. Experimental results demonstrate that our proposed method outperforms current state-of-the-art approaches by an average of 3.12% in terms of AP_R40 for car category across various adverse environments.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of the decline in monocular 3D object detection performance under adverse weather conditions. Specifically, the author proposes a model named MonoTDP (Monocular Twin Depth Perception) to address the following challenges: 1. **Image Quality Degradation in Harsh Environments**: - In adverse weather conditions such as heavy fog, heavy rain, and low - light environments, the image quality will decline significantly, causing objects to be partially or completely occluded, blurred, and the contrast to be reduced, thus affecting the detection performance. 2. **Loss of Depth Information**: - Monocular 3D object detection itself, having only one view, is difficult to recover depth information from 2D images, which will lead to the ambiguity and uncertainty of depth estimation, and then affect the accurate positioning and classification of objects. 3. **Scarcity of Datasets**: - The lack of a comprehensive dataset covering various complex environments and adverse weather conditions limits the learning and verification of detection algorithms under these conditions and hinders the development of more robust and adaptable 3D object detection technologies. To solve these problems, the author proposes a new monocular 3D object detection method - MonoTDP, which includes the following key components: - **Adaptive Learning Strategy**: - An adaptive learning strategy is introduced during the training process. By punishing misperception, the model's adaptability to adverse weather conditions is enhanced, helping the model extract more robust features. - **Twin Depth Perception Module**: - Simultaneously estimate the scene depth and the object depth. By combining scene - level and object - level features, the loss of depth information in harsh regions is compensated, and the accuracy of depth estimation is improved. - **New Dataset**: - A brand - new 3D object detection dataset containing various adverse weather conditions (such as light fog, heavy fog, heavy rain, torrential rain, low - light, etc.) is constructed, with 7,481 images for each category, which is used to support the training and evaluation of the model. Through these improvements, MonoTDP can significantly improve the performance and robustness of monocular 3D object detection in various harsh environments. The experimental results show that the average precision of this method in multiple harsh environments is 3.12% higher than that of the existing state - of - the - art methods.

MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes

Leveraging Front and Side Cues for Occlusion Handling in Monocular 3D Object Detection

Pseudo-Mono for Monocular 3D Object Detection in Autonomous Driving

MonoCD: Monocular 3D Object Detection with Complementary Depths

MonoAux: Fully Exploiting Auxiliary Information and Uncertainty for Monocular 3D Object Detection

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware Robust Adversarial Training

Fully Test-Time Adaptation for Monocular 3D Object Detection

MonoMM: A Multi-scale Mamba-Enhanced Network for Real-time Monocular 3D Object Detection

MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3D Object Detection

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach

Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth

MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

MP-Mono: Monocular 3D Detection Using Multiple Priors for Autonomous Driving

Depth Is All You Need for Monocular 3D Detection

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders