Disentangling Monocular 3D Object Detection

Andrea Simonelli,Samuel Rota Rota Bulò,Lorenzo Porzi,Manuel López-Antequera,Peter Kontschieder
DOI: https://doi.org/10.48550/arXiv.1905.12365
2019-05-29
Abstract:In this paper we propose an approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes. Our proposed loss disentanglement has the twofold advantage of simplifying the training dynamics in the presence of losses with complex interactions of parameters, and sidestepping the issue of balancing independent regression terms. Our solution overcomes these issues by isolating the contribution made by groups of parameters to a given loss, without changing its nature. We further apply loss disentanglement to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. Besides our methodological innovations, we critically review the AP metric used in KITTI3D, which emerged as the most important dataset for comparing 3D detection results. We identify and resolve a flaw in the 11-point interpolated AP metric, affecting all previously published detection results and particularly biases the results of monocular 3D detection. We provide extensive experimental evaluations and ablation studies on the KITTI3D and nuScenes datasets, setting new state-of-the-art results on object category car by large margins.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in monocular 3D object detection. Specifically, the author aims to improve 2D and 3D detection losses by introducing a novel disentangling transformation, and proposes a self - supervised confidence scoring method to enhance the prediction accuracy of 3D bounding boxes. In addition, the author also re - examines the widely - used average precision (AP) evaluation metric in the KITTI3D dataset, discovers the existing flaws and proposes a correction plan. ### Main problems solved 1. **Interaction among complex parameters**: - In the monocular 3D object detection task, there are complex interaction relationships among the parameters of 2D and 3D detection losses, which makes the optimization in the training process difficult. To solve this problem, the author introduces the disentangling transformation to separate the contributions of different parameter groups to the loss, thereby simplifying the training dynamics and avoiding the balance problem between independent regression terms. - Formula representation: \[ L_{\text{dis}}(y, \hat{y})=\sum_{j = 1}^{k}L(\psi(\theta_j,\hat{\theta}_{-j}),\hat{y}), \] where \(L\) is the original loss function, \(\psi\) is a function that maps the network output to the target space, and \(\theta_j\) and \(\hat{\theta}_{-j}\) represent the \(j\)-th group of parameters and other parameters respectively. 2. **Confidence scoring of 3D bounding boxes**: - To improve the confidence scoring of 3D bounding boxes, the author introduces a new self - supervised method, which optimizes by converting the 3D detection loss into a confidence score within the probability range. - Formula representation: \[ \hat{p}_{3D|2D}=e^{-\frac{1}{T}L_{bb}^{3D}(B,\hat{B})}, \] where \(T>0\) is the temperature parameter, and \(L_{bb}^{3D}(B,\hat{B})\) is the 3D bounding box regression loss. 3. **Flaws in the KITTI3D AP metric**: - The author discovers that the 11 - point interpolated average precision (AP) metric used in the KITTI3D dataset has a major flaw, that is, a high AP score can be obtained using a single high - confidence detection result, which leads to an overestimation of the model performance. - For this reason, the author proposes a corrected AP calculation method to evaluate the model performance more accurately. ### Summary This paper significantly improves the performance of monocular 3D object detection by introducing the disentangling transformation and the self - supervised confidence scoring method, and reveals the shortcomings of the existing evaluation methods through a critical review of the AP metric in the KITTI3D dataset, providing an important reference for future research.