CV-MOS: A Cross-View Model for Motion Segmentation

Xiaoyu Tang,Zeyu Chen,Jintao Cheng,Xieyuanli Chen,Jin Wu,Bohuan Xue
2024-08-25
Abstract:In autonomous driving, accurately distinguishing between static and moving objects is crucial for the autonomous driving system. When performing the motion object segmentation (MOS) task, effectively leveraging motion information from objects becomes a primary challenge in improving the recognition of moving objects. Previous methods either utilized range view (RV) or bird's eye view (BEV) residual maps to capture motion information. Unlike traditional approaches, we propose combining RV and BEV residual maps to exploit a greater potential of motion information jointly. Thus, we introduce CV-MOS, a cross-view model for moving object segmentation. Novelty, we decouple spatial-temporal information by capturing the motion from BEV and RV residual maps and generating semantic features from range images, which are used as moving object guidance for the motion branch. Our direct and unique solution maximizes the use of range images and RV and BEV residual maps, significantly enhancing the performance of LiDAR-based MOS task. Our method achieved leading IoU(\%) scores of 77.5\% and 79.2\% on the validation and test sets of the SemanticKitti dataset. In particular, CV-MOS demonstrates SOTA performance to date on various datasets. The CV-MOS implementation is available at <a class="link-external link-https" href="https://github.com/SCNU-RISLAB/CV-MOS" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the key challenge of accurately distinguishing static and dynamic objects in an autonomous driving system. Specifically, it focuses on a major problem in the Motion Object Segmentation (MOS) task: how to effectively utilize motion information from objects to improve the recognition accuracy of moving objects. #### Background and problem description In the autonomous driving scenario, accurately identifying moving objects is crucial for completing various downstream tasks, such as obstacle avoidance, Simultaneous Localization And Mapping (SLAM), and path planning. However, traditional semantic segmentation tasks can only segment static objects and are unable to handle moving objects. In addition, due to the irregularity and sparsity of point - cloud data, the MOS task still faces challenges. #### Limitations of existing methods Existing MOS methods mainly rely on two projection methods to capture motion information: 1. **Range View (RV) projection**: This method is prone to boundary blurring problems when dealing with distant objects and is sensitive to distance changes. 2. **Bird's Eye View (BEV) projection**: Although the BEV projection performs better in dealing with occlusions, it has a poor effect on distant objects because these objects may have only a few points. #### Proposed solution To solve the above - mentioned problems, the authors propose CV - MOS (Cross - View Model for Motion Segmentation), which is a cross - view model that combines RV and BEV residual maps to more comprehensively utilize motion information. The main innovations of CV - MOS include: 1. **Cross - view motion branch structure**: An additional BEV motion branch is introduced to work together with the traditional RV motion branch, thereby more effectively utilizing multi - view motion information. 2. **Spatial - Channel Attention Module (SCAM)**: By introducing a 3D spatial - channel attention mechanism, the performance of the motion branch is further optimized, reducing information loss and increasing the inference speed. 3. **Fusion of multi - view features**: The BEV feature map is converted into the space of the RV feature map through a geometric calibration method to achieve effective fusion of the two. Through these improvements, CV - MOS has achieved leading IoU scores (77.5% and 79.2%) on the SemanticKITTI - MOS dataset and has verified its robustness and superior performance on multiple benchmark datasets. #### Summary The core objective of this paper is to significantly improve the performance of LiDAR - based motion object segmentation tasks by designing a cross - view motion information capture network, thereby providing more reliable support for the perception module of the autonomous driving system.