A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation

Hamidreza Fazlali,Yixuan Xu,Yuan Ren,Bingbing Liu
DOI: https://doi.org/10.48550/arXiv.2203.02133
2022-03-04
Abstract:3D object detection using LiDAR data is an indispensable component for autonomous driving systems. Yet, only a few LiDAR-based 3D object detection methods leverage segmentation information to further guide the detection process. In this paper, we propose a novel multi-task framework that jointly performs 3D object detection and panoptic segmentation. In our method, the 3D object detection backbone in Bird's-Eye-View (BEV) plane is augmented by the injection of Range-View (RV) feature maps from the 3D panoptic segmentation backbone. This enables the detection backbone to leverage multi-view information to address the shortcomings of each projection view. Furthermore, foreground semantic information is incorporated to ease the detection task by highlighting the locations of each object class in the feature maps. Finally, a new center density heatmap generated based on the instance-level information further guides the detection backbone by suggesting possible box center locations for objects. Our method works with any BEV-based 3D object detection method, and as shown by extensive experiments on the nuScenes dataset, it provides significant performance gains. Notably, the proposed method based on a single-stage CenterPoint 3D object detection network achieved state-of-the-art performance on nuScenes 3D Detection Benchmark with 67.3 NDS.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to more effectively combine segmentation information to improve detection performance when using LiDAR data for 3D object detection in an autonomous driving system. Specifically, existing LiDAR - based 3D object detection methods rarely use segmentation information to further guide the detection process. This paper proposes a new multi - task framework that jointly performs 3D object detection and panoptic segmentation. By introducing the Range - View (RV) feature map to enhance the 3D object detection backbone network on the Bird’s - Eye - View (BEV) plane, it uses multi - view information to make up for the deficiencies of each projection view. In addition, by combining foreground semantic information to highlight the position of each object category and generating a new center - density heat map to further guide the detection backbone network, it suggests possible object - box center positions. This method can be combined with any BEV - based 3D object detection method, and experiments on the nuScenes dataset show that this method can significantly improve performance, especially for the single - stage CenterPoint 3D object detection network method, achieving the latest performance of 67.3 NDS on the nuScenes 3D detection benchmark. In short, the main contributions of this paper are as follows: 1. Proposing a multi - task framework that simultaneously learns 3D panoptic segmentation and 3D object detection to improve 3D object recognition and localization. To the best of the authors' knowledge, this is the first framework to simultaneously use semantic - level and instance - level information to improve 3D object detection. 2. The framework is designed to be easily attached to any BEV - based object detection method as a plug - and - play solution to improve detection performance. 3. The effectiveness of this method is verified through extensive experiments on the nuScenes dataset containing panoptic and 3D box information. 4. The contribution of each newly added component to performance improvement is further investigated through ablation studies.