MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with Bird's Eye View based Appearance and Motion Features

Bo Zhou,Jiapeng Xie,Yan Pan,Jiajie Wu,Chuanzhao Lu
DOI: https://doi.org/10.1109/LRA.2023.3325687
2023-10-10
Abstract:Identifying moving objects is an essential capability for autonomous systems, as it provides critical information for pose estimation, navigation, collision avoidance, and static map construction. In this paper, we present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation, which segments moving objects with appearance and motion features in the bird's eye view (BEV) domain. Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency. Specifically, we learn appearance features with a simplified PointNet and compute motion features through the height differences of consecutive frames of point clouds projected onto vertical columns in the polar BEV coordinate system. We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the spatio-temporal information from appearance and motion features. Our approach achieves state-of-the-art performance on the SemanticKITTI-MOS benchmark. Furthermore, to demonstrate the practical effectiveness of our method, we provide a LiDAR-MOS dataset recorded by a solid-state LiDAR, which features non-repetitive scanning patterns and a small field of view.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the ability to accurately identify moving objects in an autonomous driving system. Specifically, the paper proposes a new framework named MotionBEV, aiming to efficiently and accurately segment moving objects in LiDAR (Light Detection and Ranging) point clouds by using appearance and motion features based on Bird - Eye - View (BEV). This task is crucial for applications such as pose estimation, navigation, collision avoidance, and static map construction. ### Main contributions of the paper: 1. **Proposing a BEV - based method**: This method utilizes high - quality spatio - temporal information to perform LiDAR moving object segmentation from appearance and motion features. Specifically, this method uses a simplified version of PointNet to learn the appearance features of each grid cell and extracts motion features through the height differences of vertical columns. This BEV - based motion feature is robust to distance changes. 2. **Designing a two - branch network**: This network adaptively fuses appearance and motion features through the Appearance - Motion Co - Attention Module (AMCM). AMCM dynamically assigns importance weights to appearance and motion features to balance their contributions. In addition, AMCM also enhances appearance features through an attention mechanism, ensuring the effective fusion and mutual reinforcement of appearance and motion features. 3. **Achieving state - of - the - art performance on the SemanticKITTI - MOS benchmark**: This method reaches 69.7% in the IoU (Intersection over Union) of the moving category, with an average inference time of 23 milliseconds (on an RTX 3090 GPU). Moreover, this method has also been evaluated on a dataset recorded by a solid - state LiDAR, demonstrating its practical effectiveness in non - repetitive scanning modes and small fields of view. ### Method overview: - **Input representation**: Project the 3D LiDAR point cloud into a Bird - Eye - View (BEV) image in polar coordinates to improve computational efficiency. Each point is assigned to the corresponding grid according to its polar coordinates. - **Motion feature generation**: Generate motion features by calculating the height differences of BEV images within adjacent time windows. This method reduces the influence of sparse point clouds and is insensitive to distance changes. - **Network structure**: A two - branch network structure based on PolarNet, which adaptively fuses appearance and motion features through AMCM. AMCM includes a Co - Attention Gate (CAG) and a Motion - Guided Attention Module (MGA), which are used to balance the contributions of multi - modal features and suppress redundant and misleading information. ### Experimental results: - **Quantitative comparison**: On the SemanticKITTI - MOS benchmark, MotionBEV achieves the best results with only using moving object labels, with an IoU of 69.7%. - **Qualitative comparison**: Compared with methods such as LMNet, MotionSeg3D, and 4DMOS, MotionBEV shows clearer boundaries and higher accuracy when segmenting dynamic objects. - **Ablation experiments**: The effectiveness of AMCM is verified through different feature fusion methods and motion feature inputs. The results show that AMCM significantly improves the performance of the model. ### Conclusion: MotionBEV provides an efficient and accurate LiDAR moving object segmentation method by combining BEV - based appearance and motion features. The performance of this method on multiple datasets proves its practicality and effectiveness in autonomous driving systems.