Self-supervised 3D Vehicle Detection Based on Monocular Images

He Liu,Yi Sun
DOI: https://doi.org/10.1016/j.image.2024.117149
2024-01-01
Abstract:The deep learning-based 3D object detection literature on monocular images has been dominated by methods that require supervision in the form of 3D bounding box annotations for training. However, obtaining sufficient 3D annotations is expensive, laborious and prone to introducing errors. To address this problem, we propose a monocular self-supervised approach towards 3D object detection relying solely on observed RGB data rather than 3D bounding boxes for training. We leverage differentiable rendering to apply visual alignment to depth maps, instance masks and point clouds for self-supervision. Furthermore, considering the complexity of autonomous driving scenes, we introduce a point cloud filter to reduce noise impact and design an automatic training set pruning strategy suitable for the self-supervised framework to further improve network performance. We provide detailed experiments on the KITTI benchmark and achieve competitive performance with existing self-supervised methods as well as some fully supervised methods.
What problem does this paper attempt to address?