Abstract:Semantic map construction under bird's-eye view (BEV) plays an essential role in autonomous driving. In contrast to camera image, LiDAR provides the accurate 3D observations to project the captured 3D features onto BEV space inherently. However, the vanilla LiDAR-based BEV feature often contains many indefinite noises, where the spatial features have little texture and semantic cues. In this paper, we propose an effective LiDAR-based method to build semantic map. Specifically, we introduce a BEV feature pyramid decoder that learns the robust multi-scale BEV features for semantic map construction, which greatly boosts the accuracy of the LiDAR-based method. To mitigate the defects caused by lacking semantic cues in LiDAR data, we present an online Camera-to-LiDAR distillation scheme to facilitate the semantic learning from image to point cloud. Our distillation scheme consists of feature-level and logit-level distillation to absorb the semantic information from camera in BEV. The experimental results on challenging nuScenes dataset demonstrate the efficacy of our proposed LiDAR2Map on semantic map construction, which significantly outperforms the previous LiDAR-based methods over 27.9% mIoU and even performs better than the state-of-the-art camera-based approaches. Source code is available at: <a class="link-external link-https" href="https://github.com/songw-zju/LiDAR2Map" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to effectively construct semantic maps in autonomous driving. Specifically, the authors propose a LiDAR-based method to construct semantic maps in Bird's Eye View (BEV) to overcome the issues present in traditional methods. ### Background Issues 1. **Limitations of Camera Images**: - While camera images can provide rich texture and semantic information, they suffer from spatial distortion issues when constructing semantic maps and rely on high-resolution images and large pre-trained models, which pose challenges in practical applications. 2. **Limitations of LiDAR Data**: - LiDAR provides accurate 3D spatial information, but the BEV features it generates often contain a lot of uncertain noise and lack texture and semantic clues. ### Proposed Method To overcome the above issues, the authors propose the **LiDAR2Map** method, which mainly includes the following aspects: 1. **BEV Feature Pyramid Decoder (BEV-FPD)**: - An efficient decoder is introduced to learn robust multi-scale BEV feature representations from the precise spatial information of LiDAR point clouds. This improves the accuracy of the baseline model. 2. **Online Camera-to-LiDAR Distillation Scheme**: - An online camera-to-LiDAR distillation scheme is proposed to transfer semantic information from images to LiDAR data through feature-level and logic-level distillation. Specifically, it includes: - **Position-Guided Feature Fusion Module (PGF2M)**: Used to better fuse the features of the camera and LiDAR in the BEV space. - **Feature-Level Distillation (FD)**: Generates a global affinity map through a tree filter to achieve feature-level distillation. - **Logic-Level Distillation (LD)**: Measures the similarity of probability distributions through KL divergence, allowing the LiDAR branch to learn soft labels from the camera-LiDAR fusion model. ### Experimental Results - Experimental results on the nuScenes dataset show that LiDAR2Map significantly outperforms existing LiDAR-based methods in the semantic map construction task, with mIoU improving from 29.5% to 57.4%. - In the vehicle segmentation task, LiDAR2Map also performs excellently, not only surpassing existing camera-based methods in accuracy but also having advantages in model parameters and inference speed. ### Main Contributions 1. An efficient framework, LiDAR2Map, is proposed, where the BEV Feature Pyramid Decoder can learn robust BEV feature representations, improving the performance of the baseline model. 2. An effective online camera-to-LiDAR distillation scheme is introduced, performing feature-level and logic-level distillation during training to fully absorb the semantic representations from images. 3. Extensive experiments are conducted on the nuScenes dataset, including map and vehicle segmentation tasks, demonstrating the superior performance of the proposed method. In summary, this paper proposes an efficient and accurate semantic map construction method by combining the precise spatial information of LiDAR and the rich semantic information of cameras.

LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

HDMapNet: A Local Semantic Map Learning and Evaluation Framework.

Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving

HDMapNet: An Online HD Map Construction and Evaluation Framework

Building and optimization of 3D semantic map based on Lidar and camera fusion

MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation

Towards a Meaningful 3D Map Using a 3D Lidar and a Camera

Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

A spatially enhanced network with camera-lidar fusion for 3D semantic segmentation

Reconstruction of High-Precision Semantic Map

LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes

Voxel- and Bird's-Eye-View-Based Semantic Scene Completion for LiDAR Point Clouds

MENet: Map-enhanced 3D object detection in bird's-eye view for LiDAR point clouds

UFO: Uncertainty-aware LiDAR-image Fusion for Off-road Semantic Terrain Map Estimation

BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera

Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds

RS-SLAM: Real time semantic slam with driverless car using LiDAR-Camera-IMU sensing