Uncertainty-driven mixture convolution and transformer network for remote sensing image super-resolution

Xiaomin Zhang
DOI: https://doi.org/10.1038/s41598-024-59384-x
IF: 4.6
2024-04-25
Scientific Reports
Abstract:Recently, convolutional neural networks (CNNs) and Transformer-based Networks have exhibited remarkable prowess in the realm of remote sensing image super-resolution (RSISR), delivering promising results in the field. Nevertheless, the effective fusion of the inductive bias inherent in CNNs and the long-range modeling capabilities encapsulated within the Transformer architecture remains a relatively uncharted terrain in the context of RSISR endeavors. Accordingly, we propose an uncertainty-driven mixture convolution and transformer network (UMCTN) to earn a performance promotion. Specifically, to acquire multi-scale and hierarchical features, UMCTN adopts a U-shape architecture. Utilizing the dual-view aggregation block (DAB) based residual dual-view aggregation group (RDAG) in both encoder and decoder, we solely introduce a pioneering dense-sparse transformer group (DSTG) into the latent layer. This design effectively eradicates the considerable quadratic complexity inherent in vanilla Transformer structures. Moreover, we introduce a novel uncertainty-driven Loss (UDL) to steer the network's attention towards pixels exhibiting significant variance. The primary objective is to elevate the reconstruction quality specifically in texture and edge regions. Experimental outcomes on the UCMerced LandUse and AID datasets unequivocally affirm that UMCTN achieves state-of-the-art performance in comparison to presently prevailing methodologies.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in the field of remote - sensing image super - resolution (RSISR), how to effectively fuse the local inductive bias of convolutional neural networks (CNNs) and the long - range modeling ability of the Transformer architecture to improve the reconstruction quality of remote - sensing images, especially in the detail restoration of texture and edge regions. Existing methods have deficiencies in dealing with these details. For example, traditional CNN - based methods perform poorly in capturing global structure information, while Transformer - based methods can model long - range dependencies but have high computational complexity and may ignore high - frequency details. To solve these problems, the authors propose an uncertainty - driven hybrid convolution and Transformer network (UMCTN). UMCTN efficiently extracts local detail information by introducing the Residual Dual - view Aggregation Group (RDAG), and uses the Dense - Sparse Transformer Block (DSTB) in the latent layer to model global structure information and non - local dependencies. In addition, the authors also introduce a new uncertainty - driven loss (UDL), which enables the network to focus on pixels with significant variances, especially in texture and edge regions, thereby improving the reconstruction quality of these regions. Specifically, the main contributions of UMCTN include: 1. Proposing a new RSISR method, UMCTN, which combines the advantages of CNNs and Transformers and integrates an adaptive loss mechanism. 2. Designing a hybrid feature exploration network aimed at effectively capturing and faithfully restoring high - frequency details in remote - sensing images. 3. Introducing an uncertainty - driven loss, enabling the network to dynamically focus on complex high - frequency regions and endowing the network with spatial adaptability. 4. Experimental results on two public datasets show that UMCTN performs well in both objective and subjective quality metrics, verifying its effectiveness. Through these innovations, UMCTN aims to overcome the limitations of existing methods and provide a more efficient and higher - quality solution for remote - sensing image super - resolution.