Abstract:3D object detection using LiDAR data is an indispensable component for autonomous driving systems. Yet, only a few LiDAR-based 3D object detection methods leverage segmentation information to further guide the detection process. In this paper, we propose a novel multi-task framework that jointly performs 3D object detection and panoptic segmentation. In our method, the 3D object detection backbone in Bird's-Eye-View (BEV) plane is augmented by the injection of Range-View (RV) feature maps from the 3D panoptic segmentation backbone. This enables the detection backbone to leverage multi-view information to address the shortcomings of each projection view. Furthermore, foreground semantic information is incorporated to ease the detection task by highlighting the locations of each object class in the feature maps. Finally, a new center density heatmap generated based on the instance-level information further guides the detection backbone by suggesting possible box center locations for objects. Our method works with any BEV-based 3D object detection method, and as shown by extensive experiments on the nuScenes dataset, it provides significant performance gains. Notably, the proposed method based on a single-stage CenterPoint 3D object detection network achieved state-of-the-art performance on nuScenes 3D Detection Benchmark with 67.3 NDS.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to more effectively combine segmentation information to improve detection performance when using LiDAR data for 3D object detection in an autonomous driving system. Specifically, existing LiDAR - based 3D object detection methods rarely use segmentation information to further guide the detection process. This paper proposes a new multi - task framework that jointly performs 3D object detection and panoptic segmentation. By introducing the Range - View (RV) feature map to enhance the 3D object detection backbone network on the Bird’s - Eye - View (BEV) plane, it uses multi - view information to make up for the deficiencies of each projection view. In addition, by combining foreground semantic information to highlight the position of each object category and generating a new center - density heat map to further guide the detection backbone network, it suggests possible object - box center positions. This method can be combined with any BEV - based 3D object detection method, and experiments on the nuScenes dataset show that this method can significantly improve performance, especially for the single - stage CenterPoint 3D object detection network method, achieving the latest performance of 67.3 NDS on the nuScenes 3D detection benchmark. In short, the main contributions of this paper are as follows: 1. Proposing a multi - task framework that simultaneously learns 3D panoptic segmentation and 3D object detection to improve 3D object recognition and localization. To the best of the authors' knowledge, this is the first framework to simultaneously use semantic - level and instance - level information to improve 3D object detection. 2. The framework is designed to be easily attached to any BEV - based object detection method as a plug - and - play solution to improve detection performance. 3. The effectiveness of this method is verified through extensive experiments on the nuScenes dataset containing panoptic and 3D box information. 4. The contribution of each newly added component to performance improvement is further investigated through ablation studies.

A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection

AOP-Net: All-in-One Perception Network for Joint LiDAR-based 3D Object Detection and Panoptic Segmentation

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

3D Object Detection for Point Cloud in Virtual Driving Environment

Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View

Multi-View 3D Object Detection Network for Autonomous Driving

Location-Guided LiDAR-Based Panoptic Segmentation for Autonomous Driving.

Small, Versatile and Mighty: A Range-View Perception Framework

Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images

Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement

OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection

Multimodal Virtual Point 3D Detection

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection

GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection