Abstract:With the rapid advances of autonomous driving, it becomes critical to equip its sensing system with more holistic 3D perception. However, existing works focus on parsing either the objects (e.g. cars and pedestrians) or scenes (e.g. trees and buildings) from the LiDAR sensor. In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner. As one of the first endeavors towards this new challenging task, we propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. In particular, DS-Net has three appealing properties: 1) Strong backbone design. DS-Net adopts the cylinder convolution that is specifically designed for LiDAR point clouds. 2) Dynamic Shifting for complex point distributions. We observe that commonly-used clustering algorithms are incapable of handling complex autonomous driving scenes with non-uniform point cloud distributions and varying instance sizes. Thus, we present an efficient learnable clustering module, dynamic shifting, which adapts kernel functions on the fly for different instances. 3) Extension to 4D prediction. Furthermore, we extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames. To comprehensively evaluate the performance of LiDAR-based panoptic segmentation, we construct and curate benchmarks from two large-scale autonomous driving LiDAR datasets, SemanticKITTI and nuScenes. Extensive experiments demonstrate that our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks. Notably, in the single frame version of the task, we outperform the SOTA method by 1.8% in terms of the PQ metric. In the 4D version of the task, we surpass 2nd place by 5.4% in terms of the LSTQ metric.

Enhancing Moving Object Segmentation with Spatio-Temporal Information Fusion

ObjectFusion: an Object Detection and Segmentation Framework with RGB-D SLAM and Convolutional Neural Networks

Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation

LiDAR-Based Real-Time Panoptic Segmentation via Spatiotemporal Sequential Data Fusion

Dual-Branch Dynamic Object Segmentation Network Based on Spatio-Temporal Information Fusion

LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation

Learning Spatial and Temporal Variations for 4D Point Cloud Segmentation

Multi-sensor fusion for robust localization with moving object segmentation in complex dynamic 3D scenes

MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with Bird's Eye View based Appearance and Motion Features

SIESEF-FusionNet: Spatial Inter-correlation Enhancement and Spatially-Embedded Feature Fusion Network for LiDAR Point Cloud Semantic Segmentation

Improved 3D Semantic Segmentation Model Based on RGB Image and LiDAR Point Cloud Fusion for Automantic Driving

Instance Segmentation of Sparse Point Clouds with Spatio-Temporal Coding for Autonomous Robot

LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment

Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences

RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving

A spatially enhanced network with camera-lidar fusion for 3D semantic segmentation

MF-MOS: A Motion-Focused Model for Moving Object Segmentation

Robust 3D Semantic Segmentation Based on Multi-Phase Multi-Modal Fusion for Intelligent Vehicles

LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network