DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation

Yueru Luo,Shuguang Cui,Zhen Li
2024-06-23
Abstract:Accurate 3D lane estimation is crucial for ensuring safety in autonomous driving. However, prevailing monocular techniques suffer from depth loss and lighting variations, hampering accurate 3D lane detection. In contrast, LiDAR points offer geometric cues and enable precise localization. In this paper, we present DV-3DLane, a novel end-to-end Dual-View multi-modal 3D Lane detection framework that synergizes the strengths of both images and LiDAR points. We propose to learn multi-modal features in dual-view spaces, i.e., perspective view (PV) and bird's-eye-view (BEV), effectively leveraging the modal-specific information. To achieve this, we introduce three designs: 1) A bidirectional feature fusion strategy that integrates multi-modal features into each view space, exploiting their unique strengths. 2) A unified query generation approach that leverages lane-aware knowledge from both PV and BEV spaces to generate queries. 3) A 3D dual-view deformable attention mechanism, which aggregates discriminative features from both PV and BEV spaces into queries for accurate 3D lane detection. Extensive experiments on the public benchmark, OpenLane, demonstrate the efficacy and efficiency of DV-3DLane. It achieves state-of-the-art performance, with a remarkable 11.2 gain in F1 score and a substantial 53.5% reduction in errors. The code is available at \url{<a class="link-external link-https" href="https://github.com/JMoonr/dv-3dlane" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the accuracy issue of 3D lane detection in autonomous driving scenarios, particularly how to improve the performance of 3D lane detection in complex and variable environments (such as different weather and lighting conditions). To tackle this problem, the authors propose a new method called DV-3DLane, which is an end-to-end dual-view multimodal 3D lane detection framework. Specifically, the problems addressed in the paper can be summarized as: 1. **Overcoming the limitations of monocular techniques**: Traditional monocular camera-based techniques have issues with depth information loss and lighting variations, leading to inaccurate 3D lane detection. 2. **Leveraging the advantages of LiDAR**: Compared to monocular cameras, LiDAR can provide more accurate spatial positioning information, which helps improve the accuracy of 3D lane detection. 3. **Fusing image and LiDAR data**: By effectively fusing data from images and LiDAR, the method aims to fully utilize the advantages of both to enhance the effectiveness of 3D lane detection. To achieve these goals, the paper proposes the following key technical points: - **Bidirectional Feature Fusion (BFF) strategy**: This strategy fuses features between the image space (Perspective View, PV) and the bird's-eye view (BEV) to extract complementary information from both modalities. - **Unified Query Generator (UQG)**: This generates two sets of queries containing lane-related information and merges them into a unified query set for subsequent decoding processes. - **3D Dual-View Deformable Attention Mechanism**: This mechanism effectively aggregates features between the image space and the bird's-eye view space, thereby improving the accuracy of 3D lane detection. Through the above methods, DV-3DLane achieves significant performance improvements on the OpenLane dataset, particularly in terms of F1 score and error rate. Experimental results show that the method performs well even under stricter distance thresholds (e.g., 0.5 meters), demonstrating its potential in ensuring the safety of autonomous driving.