Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection.

Erli Ouyang,Li Zhang,Mohan Chen,Anurag Arnab,Yanwei Fu
DOI: https://doi.org/10.1007/978-3-030-69525-5_21
2020-01-01
Abstract:Visual-based 3D detection is drawing a lot of attention recently. Despite the best efforts from the computer vision researchers visual-based 3D detection remains a largely unsolved problem. This is primarily due to the lack of accurate depth perception provided by LiDAR sensors. Previous works struggle to fuse 3D spatial information and the RGB image effectively. In this paper, we propose a novel monocular 3D detection framework to address this problem. Specifically, we propose to primary contributions: (i) We design an Adaptive Depth-guided Instance Normalization layer to leverage depth features to guide RGB features for high quality estimation of 3D properties. (ii) We introduce a Dynamic Depth Transformation module to better recover accurate depth according to semantic context learning and thus facilitate the removal of depth ambiguities that exist in the RGB image. Experiments show that our approach achieves state-of-the-art on KITTI 3D detection benchmark among current monocular 3D detection works.
What problem does this paper attempt to address?