Abstract:Identifying moving objects is an essential capability for autonomous systems, as it provides critical information for pose estimation, navigation, collision avoidance, and static map construction. In this paper, we present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation, which segments moving objects with appearance and motion features in the bird's eye view (BEV) domain. Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency. Specifically, we learn appearance features with a simplified PointNet and compute motion features through the height differences of consecutive frames of point clouds projected onto vertical columns in the polar BEV coordinate system. We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the spatio-temporal information from appearance and motion features. Our approach achieves state-of-the-art performance on the SemanticKITTI-MOS benchmark. Furthermore, to demonstrate the practical effectiveness of our method, we provide a LiDAR-MOS dataset recorded by a solid-state LiDAR, which features non-repetitive scanning patterns and a small field of view.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the ability to accurately identify moving objects in an autonomous driving system. Specifically, the paper proposes a new framework named MotionBEV, aiming to efficiently and accurately segment moving objects in LiDAR (Light Detection and Ranging) point clouds by using appearance and motion features based on Bird - Eye - View (BEV). This task is crucial for applications such as pose estimation, navigation, collision avoidance, and static map construction. ### Main contributions of the paper: 1. **Proposing a BEV - based method**: This method utilizes high - quality spatio - temporal information to perform LiDAR moving object segmentation from appearance and motion features. Specifically, this method uses a simplified version of PointNet to learn the appearance features of each grid cell and extracts motion features through the height differences of vertical columns. This BEV - based motion feature is robust to distance changes. 2. **Designing a two - branch network**: This network adaptively fuses appearance and motion features through the Appearance - Motion Co - Attention Module (AMCM). AMCM dynamically assigns importance weights to appearance and motion features to balance their contributions. In addition, AMCM also enhances appearance features through an attention mechanism, ensuring the effective fusion and mutual reinforcement of appearance and motion features. 3. **Achieving state - of - the - art performance on the SemanticKITTI - MOS benchmark**: This method reaches 69.7% in the IoU (Intersection over Union) of the moving category, with an average inference time of 23 milliseconds (on an RTX 3090 GPU). Moreover, this method has also been evaluated on a dataset recorded by a solid - state LiDAR, demonstrating its practical effectiveness in non - repetitive scanning modes and small fields of view. ### Method overview: - **Input representation**: Project the 3D LiDAR point cloud into a Bird - Eye - View (BEV) image in polar coordinates to improve computational efficiency. Each point is assigned to the corresponding grid according to its polar coordinates. - **Motion feature generation**: Generate motion features by calculating the height differences of BEV images within adjacent time windows. This method reduces the influence of sparse point clouds and is insensitive to distance changes. - **Network structure**: A two - branch network structure based on PolarNet, which adaptively fuses appearance and motion features through AMCM. AMCM includes a Co - Attention Gate (CAG) and a Motion - Guided Attention Module (MGA), which are used to balance the contributions of multi - modal features and suppress redundant and misleading information. ### Experimental results: - **Quantitative comparison**: On the SemanticKITTI - MOS benchmark, MotionBEV achieves the best results with only using moving object labels, with an IoU of 69.7%. - **Qualitative comparison**: Compared with methods such as LMNet, MotionSeg3D, and 4DMOS, MotionBEV shows clearer boundaries and higher accuracy when segmenting dynamic objects. - **Ablation experiments**: The effectiveness of AMCM is verified through different feature fusion methods and motion feature inputs. The results show that AMCM significantly improves the performance of the model. ### Conclusion: MotionBEV provides an efficient and accurate LiDAR moving object segmentation method by combining BEV - based appearance and motion features. The performance of this method on multiple datasets proves its practicality and effectiveness in autonomous driving systems.

MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with Bird's Eye View based Appearance and Motion Features

LiDAR Video Object Segmentation with Dynamic Kernel Refinement

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks

BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object Detection for Autonomous Driving

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

LiDAR Panoptic Segmentation for Autonomous Driving

Location-Guided LiDAR-Based Panoptic Segmentation for Autonomous Driving.

Semantics-Guided Moving Object Segmentation with 3D LiDAR

3D-SeqMOS: A Novel Sequential 3D Moving Object Segmentation in Autonomous Driving

Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation

LiDAR-based Panoptic Segmentation via Dynamic Shifting Network