Abstract:Understanding the 3D geometric structure of the Earth's surface has been an active research topic in photogrammetry and remote sensing community for decades, serving as an essential building block for various applications such as 3D digital city modeling, change detection, and city management. Previous researches have extensively studied the problem of height estimation from aerial images based on stereo or multi-view image matching. These methods require two or more images from different perspectives to reconstruct 3D coordinates with camera information provided. In this paper, we deal with the ambiguous and unsolved problem of height estimation from a single aerial image. Driven by the great success of deep learning, especially deep convolution neural networks (CNNs), some researches have proposed to estimate height information from a single aerial image by training a deep CNN model with large-scale annotated datasets. These methods treat height estimation as a regression problem and directly use an encoder-decoder network to regress the height values. In this paper, we proposed to divide height values into spacing-increasing intervals and transform the regression problem into an ordinal regression problem, using an ordinal loss for network training. To enable multi-scale feature extraction, we further incorporate an Atrous Spatial Pyramid Pooling (ASPP) module to extract features from multiple dilated convolution layers. After that, a post-processing technique is designed to transform the predicted height map of each patch into a seamless height map. Finally, we conduct extensive experiments on ISPRS Vaihingen and Potsdam datasets. Experimental results demonstrate significantly better performance of our method compared to the state-of-the-art methods.

Multi-Scale Spatio-Temporal Feature Extraction And Depth Estimation From Sequences By Ordinal Classification

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Binocular Depth Estimation Using Convolutional Neural Network With Siamese Branches.

Monocular Depth Estimation Based on Multi-Scale Graph Convolution Networks

Least Square Estimation Network for Depth Completion

> ? ∗ > 0 B ? ∗ > 0 C ? ∗ > 0 DEC Conv = Full-image Encoder Conv Conv Conv Conv Conv Conv Convs ASPP # Dense Feature Extractor Scene Understanding Modular Ordinal Regression Input Output

Deep Ordinal Regression Network for Monocular Depth Estimation

Multi-scale Depth Classification Network for Monocular Depth Estimation

Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks

Height estimation from single aerial images using a deep ordinal regression network

SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via Swin Transformer and Densely Cascaded Network

A Self-Supervised Method of Single-Image Depth Estimation by Feeding Forward Information Using Max-Pooling Layers

MSFNet:Multi-scale features network for monocular depth estimation

From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation

Exploiting Temporal Consistency for Real-Time Video Depth Estimation

Video object segmentation by Multi-Scale Pyramidal Multi-Dimensional LSTM with generated depth context

Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference

Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks

Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions

Cascade Network for Self-Supervised Monocular Depth Estimation