LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving

Sambit Mohapatra,Senthil Yogamani,Varun Ravi Kumar,Stefan Milz,Heinrich Gotzig,Patrick Mäder
2024-11-19
Abstract:LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation. The unified architecture comprises a shared encoder and task-specific decoders, enabling joint representation learning. We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively. Our heterogeneous training scheme combines diverse datasets and exploits complementary cues between tasks. The work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform. We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection. By maximizing hardware efficiency and leveraging multi-task synergies, our method delivers an accurate and efficient solution tailored for real-world automated driving deployment. Qualitative results can be seen at <a class="link-external link-https" href="https://youtu.be/H-hWRzv2lIY" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: in the field of autonomous driving, how to use LiDAR (Light Detection and Ranging) point cloud data to implement an efficient and real - time multi - task perception network. Specifically, the author aims to develop a multi - task convolutional neural network (CNN) that can achieve real - time processing on an embedded platform and can perform object detection, semantic segmentation and motion segmentation tasks simultaneously. The key challenges of these problems include: 1. **Multi - Task Learning (MTL)**: - How to effectively integrate multiple related tasks (such as object detection, semantic segmentation and motion segmentation) to improve the generalization ability and performance of the model. - While sharing features among different tasks, ensure that the optimization goals of each task do not conflict with each other or lead to over - fitting. 2. **Computational Efficiency and Hardware Compatibility**: - Achieve real - time processing with low latency (e.g., 3ms), especially on an embedded platform (such as NVIDIA Xavier). - Design a lightweight and efficient network architecture to adapt to the resource limitations of in - vehicle embedded systems. 3. **Dataset Diversity and Insufficient Labeling**: - The scale and quality of datasets for different tasks are inconsistent, and some tasks may lack sufficient labeled data. - How to combine multiple datasets through heterogeneous training strategies to make full use of the complementary information among different tasks. 4. **Feature Enhancement and Transmission**: - Propose a new Semantic Weighting and Guidance (SWAG) module to selectively transmit semantic features to improve the accuracy of object detection. - Through a simple distance - based point cloud densification technique, the point cloud density can also be increased during testing, thereby improving the detection effect of distant objects. ### Specific Problem Summary - **Object Detection**: Identify and locate objects from LiDAR point clouds and generate bounding boxes. - **Semantic Segmentation**: Assign semantic labels to each point, such as "vehicle", "pedestrian", etc. - **Motion Segmentation**: Distinguish between dynamic and static objects, which is very important for environmental modeling and SLAM (Simultaneous Localization and Mapping). ### Solution Highlights - **Unified Architecture**: Share the encoder and task - specific decoders to achieve joint representation learning. - **SWAG Module**: Enhance the effect of object detection by selectively transmitting semantic features. - **Heterogeneous Training Strategy**: Combine different datasets to make full use of the multi - task synergy. - **Point Cloud Densification**: Propose a simple and effective distance - based point cloud densification method to improve the detection accuracy of distant objects. Through these innovations, the paper demonstrates the possibility of implementing an efficient and real - time LiDAR multi - task perception network on an embedded platform and has achieved results close to or exceeding the current best level on benchmark datasets such as KITTI and Waymo.