A Sinkhorn Regularized Adversarial Network for Image Guided DEM Super-resolution using Frequency Selective Hybrid Graph Transformer

Subhajit Paul,Ashutosh Gupta
2024-09-22
Abstract:Digital Elevation Model (DEM) is an essential aspect in the remote sensing (RS) domain to analyze various applications related to surface elevations. Here, we address the generation of high-resolution (HR) DEMs using HR multi-spectral (MX) satellite imagery as a guide by introducing a novel hybrid transformer model consisting of Densely connected Multi-Residual Block (DMRB) and multi-headed Frequency Selective Graph Attention (M-FSGA). To promptly regulate this process, we utilize the notion of discriminator spatial maps as the conditional attention to the MX guide. Further, we present a novel adversarial objective related to optimizing Sinkhorn distance with classical GAN. In this regard, we provide both theoretical and empirical substantiation of better performance in terms of vanishing gradient issues and numerical convergence. Based on our experiments on 4 different DEM datasets, we demonstrate both qualitative and quantitative comparisons with available baseline methods and show that the performance of our proposed model is superior to others with sharper details and minimal errors.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the super - resolution reconstruction problem of Digital Elevation Model (DEM). Specifically, the author proposes an image - guided DEM super - resolution method, which uses high - resolution multi - spectral (MX) satellite images as guidance to generate high - resolution DEMs. #### Main problems and challenges 1. **Insufficient resolution of existing DEMs**: Existing high - resolution DEM products usually require special acquisition and processing techniques and are costly. Therefore, directly generating high - resolution DEMs from low - resolution DEMs is a more cost - effective solution. 2. **Limitations of traditional methods**: Traditional interpolation methods (such as linear interpolation and bicubic interpolation) perform poorly in high - frequency regions, resulting in smooth output results and loss of details. Other reconstruction - based methods can preserve edge information, but the effect is not ideal when the magnification factor is large. 3. **Insufficient application of deep - learning methods**: Although deep learning has made significant progress in the field of computer vision, research specifically targeting DEM super - resolution is still limited, especially in applications on real - world data sets. 4. **Problems in GAN training**: Traditional Generative Adversarial Networks (GAN) are prone to problems such as mode collapse and vanishing gradient during the training process, which affect the performance of the model. #### Solutions To solve the above problems, the author proposes an innovative framework with the following main contributions: 1. **New hybrid Transformer architecture**: Introduces Densely connected Multi - Residual Block (DMRB) and multi - headed Frequency Selective Graph Attention (M - FSGA), effectively utilizes the information of high - resolution MX images, and conditions it through Discriminative Spatial Self - Attention (DSA). 2. **Sinkhorn - regularized adversarial learning framework (SiRAN)**: Improves the training process of GAN by optimizing the Sinkhorn distance, solves the vanishing gradient problem, and improves numerical convergence. 3. **Construction of real - data sets**: Uses actual low - resolution SRTM DEM data as input instead of the common bicubic down - sampled high - resolution images to ensure that the experimental results are more realistic. 4. **Extensive experimental verification**: Through experiments on 4 different DEM data sets, shows the superior performance of the proposed method in terms of both qualitative and quantitative aspects, especially in detail preservation and error minimization. In conclusion, this paper provides an efficient and robust DEM super - resolution solution by combining image - guidance, frequency - selective graph - attention mechanisms, and Sinkhorn - regularized adversarial learning.