MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes

Xingyuan Li,Jinyuan Liu,Yixin Lei,Long Ma,Xin Fan,Risheng Liu
DOI: https://doi.org/10.48550/arXiv.2305.10974
2023-05-25
Abstract:3D object detection plays a crucial role in numerous intelligent vision systems. Detection in the open world inevitably encounters various adverse scenes, such as dense fog, heavy rain, and low light conditions. Although existing efforts primarily focus on diversifying network architecture or training schemes, resulting in significant progress in 3D object detection, most of these learnable modules fail in adverse scenes, thereby hindering detection performance. To address this issue, this paper proposes a monocular 3D detection model designed to perceive twin depth in adverse scenes, termed MonoTDP, which effectively mitigates the degradation of detection performance in various harsh environments. Specifically, we first introduce an adaptive learning strategy to aid the model in handling uncontrollable weather conditions, significantly resisting degradation caused by various degrading factors. Then, to address the depth/content loss in adverse regions, we propose a novel twin depth perception module that simultaneously estimates scene and object depth, enabling the integration of scene-level features and object-level features. Additionally, we assemble a new adverse 3D object detection dataset encompassing a wide range of challenging scenes, including rainy, foggy, and low light weather conditions, with each type of scene containing 7,481 images. Experimental results demonstrate that our proposed method outperforms current state-of-the-art approaches by an average of 3.12% in terms of AP_R40 for car category across various adverse environments.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of the decline in monocular 3D object detection performance under adverse weather conditions. Specifically, the author proposes a model named MonoTDP (Monocular Twin Depth Perception) to address the following challenges: 1. **Image Quality Degradation in Harsh Environments**: - In adverse weather conditions such as heavy fog, heavy rain, and low - light environments, the image quality will decline significantly, causing objects to be partially or completely occluded, blurred, and the contrast to be reduced, thus affecting the detection performance. 2. **Loss of Depth Information**: - Monocular 3D object detection itself, having only one view, is difficult to recover depth information from 2D images, which will lead to the ambiguity and uncertainty of depth estimation, and then affect the accurate positioning and classification of objects. 3. **Scarcity of Datasets**: - The lack of a comprehensive dataset covering various complex environments and adverse weather conditions limits the learning and verification of detection algorithms under these conditions and hinders the development of more robust and adaptable 3D object detection technologies. To solve these problems, the author proposes a new monocular 3D object detection method - MonoTDP, which includes the following key components: - **Adaptive Learning Strategy**: - An adaptive learning strategy is introduced during the training process. By punishing misperception, the model's adaptability to adverse weather conditions is enhanced, helping the model extract more robust features. - **Twin Depth Perception Module**: - Simultaneously estimate the scene depth and the object depth. By combining scene - level and object - level features, the loss of depth information in harsh regions is compensated, and the accuracy of depth estimation is improved. - **New Dataset**: - A brand - new 3D object detection dataset containing various adverse weather conditions (such as light fog, heavy fog, heavy rain, torrential rain, low - light, etc.) is constructed, with 7,481 images for each category, which is used to support the training and evaluation of the model. Through these improvements, MonoTDP can significantly improve the performance and robustness of monocular 3D object detection in various harsh environments. The experimental results show that the average precision of this method in multiple harsh environments is 3.12% higher than that of the existing state - of - the - art methods.