Abstract:In autonomous driving, accurately distinguishing between static and moving objects is crucial for the autonomous driving system. When performing the motion object segmentation (MOS) task, effectively leveraging motion information from objects becomes a primary challenge in improving the recognition of moving objects. Previous methods either utilized range view (RV) or bird's eye view (BEV) residual maps to capture motion information. Unlike traditional approaches, we propose combining RV and BEV residual maps to exploit a greater potential of motion information jointly. Thus, we introduce CV-MOS, a cross-view model for moving object segmentation. Novelty, we decouple spatial-temporal information by capturing the motion from BEV and RV residual maps and generating semantic features from range images, which are used as moving object guidance for the motion branch. Our direct and unique solution maximizes the use of range images and RV and BEV residual maps, significantly enhancing the performance of LiDAR-based MOS task. Our method achieved leading IoU(\%) scores of 77.5\% and 79.2\% on the validation and test sets of the SemanticKitti dataset. In particular, CV-MOS demonstrates SOTA performance to date on various datasets. The CV-MOS implementation is available at <a class="link-external link-https" href="https://github.com/SCNU-RISLAB/CV-MOS" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the key challenge of accurately distinguishing static and dynamic objects in an autonomous driving system. Specifically, it focuses on a major problem in the Motion Object Segmentation (MOS) task: how to effectively utilize motion information from objects to improve the recognition accuracy of moving objects. #### Background and problem description In the autonomous driving scenario, accurately identifying moving objects is crucial for completing various downstream tasks, such as obstacle avoidance, Simultaneous Localization And Mapping (SLAM), and path planning. However, traditional semantic segmentation tasks can only segment static objects and are unable to handle moving objects. In addition, due to the irregularity and sparsity of point - cloud data, the MOS task still faces challenges. #### Limitations of existing methods Existing MOS methods mainly rely on two projection methods to capture motion information: 1. **Range View (RV) projection**: This method is prone to boundary blurring problems when dealing with distant objects and is sensitive to distance changes. 2. **Bird's Eye View (BEV) projection**: Although the BEV projection performs better in dealing with occlusions, it has a poor effect on distant objects because these objects may have only a few points. #### Proposed solution To solve the above - mentioned problems, the authors propose CV - MOS (Cross - View Model for Motion Segmentation), which is a cross - view model that combines RV and BEV residual maps to more comprehensively utilize motion information. The main innovations of CV - MOS include: 1. **Cross - view motion branch structure**: An additional BEV motion branch is introduced to work together with the traditional RV motion branch, thereby more effectively utilizing multi - view motion information. 2. **Spatial - Channel Attention Module (SCAM)**: By introducing a 3D spatial - channel attention mechanism, the performance of the motion branch is further optimized, reducing information loss and increasing the inference speed. 3. **Fusion of multi - view features**: The BEV feature map is converted into the space of the RV feature map through a geometric calibration method to achieve effective fusion of the two. Through these improvements, CV - MOS has achieved leading IoU scores (77.5% and 79.2%) on the SemanticKITTI - MOS dataset and has verified its robustness and superior performance on multiple benchmark datasets. #### Summary The core objective of this paper is to significantly improve the performance of LiDAR - based motion object segmentation tasks by designing a cross - view motion information capture network, thereby providing more reliable support for the perception module of the autonomous driving system.

CV-MOS: A Cross-View Model for Motion Segmentation

CV-MOS: A Cross-View Model for Motion Segmentation

MF-MOS: A Motion-Focused Model for Moving Object Segmentation

MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

3D-SeqMOS: A Novel Sequential 3D Moving Object Segmentation in Autonomous Driving

LiDAR Video Object Segmentation with Dynamic Kernel Refinement

MT-SSD: Single-Stage 3D Object Detector Based on Magnification Transformation

SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles

Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation

Event-Free Moving Object Segmentation from Moving Ego Vehicle

3D convolutional long short-term encoder-decoder network for moving object segmentation

Semantics-Guided Moving Object Segmentation with 3D LiDAR

Real-time Moving Object Segmentation with Tracking and Tracklet Belief

Real-Time LiDAR Point-Cloud Moving Object Segmentation for Autonomous Driving

OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances

A Joint Object Detection and Semantic Segmentation Model with Cross-Attention and Inner-Attention Mechanisms

Simple Scalable Multimodal Semantic Segmentation Model

GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model

StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory

Segment as Points for Efficient and Effective Online Multi-Object Tracking and Segmentation