THE Benchmark: Transferable Representation Learning for Monocular Height Estimation

Zhitong Xiong,Wei Huang,Jingtao Hu,Xiao Xiang Zhu
2023-09-21
Abstract:Generating 3D city models rapidly is crucial for many applications. Monocular height estimation is one of the most efficient and timely ways to obtain large-scale geometric information. However, existing works focus primarily on training and testing models using unbiased datasets, which does not align well with real-world applications. Therefore, we propose a new benchmark dataset to study the transferability of height estimation models in a cross-dataset setting. To this end, we first design and construct a large-scale benchmark dataset for cross-dataset transfer learning on the height estimation task. This benchmark dataset includes a newly proposed large-scale synthetic dataset, a newly collected real-world dataset, and four existing datasets from different cities. Next, a new experimental protocol, few-shot cross-dataset transfer, is designed. Furthermore, in this paper, we propose a scale-deformable convolution module to enhance the window-based Transformer for handling the scale-variation problem in the height estimation task. Experimental results have demonstrated the effectiveness of the proposed methods in the traditional and cross-dataset transfer settings. The datasets and codes are publicly available at <a class="link-external link-https" href="https://mediatum.ub.tum.de/1662763" rel="external noopener nofollow">this https URL</a> and <a class="link-external link-https" href="https://thebenchmarkh.github.io/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?