HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning

Wenzhao Qiu,Shanmin Pang,Hao zhang,Jianwu Fang,Jianru Xue
2024-11-03
Abstract:Recent advances in high-definition (HD) map construction from surround-view images have highlighted their cost-effectiveness in deployment. However, prevailing techniques often fall short in accurately extracting and utilizing road features, as well as in the implementation of view transformation. In response, we introduce HeightMapNet, a novel framework that establishes a dynamic relationship between image features and road surface height distributions. By integrating height priors, our approach refines the accuracy of Bird's-Eye-View (BEV) features beyond conventional methods. HeightMapNet also introduces a foreground-background separation network that sharply distinguishes between critical road elements and extraneous background components, enabling precise focus on detailed road micro-features. Additionally, our method leverages multi-scale features within the BEV space, optimally utilizing spatial geometric information to boost model performance. HeightMapNet has shown exceptional results on the challenging nuScenes and Argoverse 2 datasets, outperforming several widely recognized approaches. The code will be available at \url{<a class="link-external link-https" href="https://github.com/adasfag/HeightMapNet/" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on several key challenges in high - definition map (HD Map) construction: 1. **Accuracy of road feature extraction and utilization**: Existing technologies have deficiencies in accurately extracting and utilizing road features from surround - view images, which affects the quality and reliability of high - definition maps. 2. **Implementation of view transformation**: Existing methods are often not precise enough when implementing view transformation, especially when dealing with complex environmental details, such as the height distribution of the road surface, which limits the model's comprehensive understanding of the environment. 3. **Separation of background and foreground**: Most existing studies fail to effectively filter non - critical elements (such as the sky and other irrelevant background features) when dealing with multi - view input image features, causing the model to be easily interfered by irrelevant data and affecting the accuracy and reliability of perception output. 4. **Utilization of multi - scale features**: Current research tends to focus on the utilization of single - layer image features to improve computational efficiency, but ignores the benefits of multi - scale feature fusion in the BEV space, which limits the effectiveness of the model in navigating complex road environments. To solve the above problems, the paper proposes the HeightMapNet framework, which improves the performance of high - definition map construction through the following innovations: - **Advanced view transformation module**: Dynamically links image features with the height distribution of the road surface, significantly enhancing spatial understanding ability and being able to be seamlessly integrated into the attention - based neural network. - **Foreground - background separation network**: Uses self - supervised learning to optimize feature extraction in the road environment, removes irrelevant background elements, improves the clarity and quality of input features, and thus enhances the reliability of perception results. - **Multi - scale feature fusion mechanism**: Realizes multi - scale feature fusion in the BEV space, improves the accuracy and robustness of map construction, especially in complex road environments. These innovations work together to make HeightMapNet perform excellently on challenging datasets such as nuScenes and Argoverse 2, surpassing several widely recognized methods.