HeightFormer: A Multilevel Interaction and Image-Adaptive Classification–Regression Network for Monocular Height Estimation with Aerial Images

Zhan Chen,Yidan Zhang,Xiyu Qi,Yongqiang Mao,Xin Zhou,Lei Wang,Yunping Ge
DOI: https://doi.org/10.3390/rs16020295
IF: 5
2024-01-12
Remote Sensing
Abstract:Height estimation has long been a pivotal topic within measurement and remote sensing disciplines, with monocular height estimation offering wide-ranging data sources and convenient deployment. This paper addresses the existing challenges in monocular height estimation methods, namely the difficulty in simultaneously achieving high-quality instance-level height and edge reconstruction, along with high computational complexity. This paper presents a comprehensive solution for monocular height estimation in remote sensing, termed HeightFormer, combining multilevel interactions and image-adaptive classification–regression. It features the Multilevel Interaction Backbone (MIB) and Image-adaptive Classification–regression Height Generator (ICG). MIB supplements the fixed sample grid in the CNN of the conventional backbone network with tokens of different interaction ranges. It is complemented by a pixel-, patch-, and feature map-level hierarchical interaction mechanism, designed to relay spatial geometry information across different scales and introducing a global receptive field to enhance the quality of instance-level height estimation. The ICG dynamically generates height partition for each image and reframes the traditional regression task, using a refinement from coarse to fine classification–regression that significantly mitigates the innate ill-posedness issue and drastically improves edge sharpness. Finally, the study conducts experimental validations on the Vaihingen and Potsdam datasets, with results demonstrating that our proposed method surpasses existing techniques.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The paper attempts to address the challenges present in monocular height estimation methods, specifically including: 1. **Difficulty in achieving both instance-level height and edge reconstruction quality**: Existing monocular height estimation methods struggle to achieve high-quality instance-level height and edge reconstruction simultaneously. 2. **High computational complexity**: The computational complexity of existing methods is relatively high, which limits their deployment and real-time performance in practical applications. To address these issues, the authors propose a new monocular height estimation method called HeightFormer, which combines multi-level interaction and image-adaptive classification-regression techniques. By introducing a multi-level interaction mechanism and an image-adaptive classification-regression generator (ICG), HeightFormer aims to improve the quality of instance-level height estimation while reducing computational complexity. Specifically: - **Multi-level Interaction Backbone Network (MIB)**: By supplementing the fixed sampling grid in traditional convolutional neural networks (CNNs) with tokens of different interaction ranges, and designing pixel-level, block-level, and feature map-level hierarchical interaction mechanisms, the quality of instance-level height estimation is enhanced. - **Image-adaptive Classification-Regression Height Generator (ICG)**: Dynamically generates height partitions for each image and redefines the traditional regression task as a coarse-to-fine classification-regression process, significantly alleviating the inherent ill-posed problem and greatly improving edge sharpness. Through these innovations, HeightFormer demonstrates excellent performance in experimental validation on the Vaihingen and Potsdam datasets, surpassing existing techniques.