Abstract:Given the impressive achievement of convolutional neural networks (CNNs) in grasping image priors from extensive datasets, they have been widely utilized for tasks related to image restoration. Recently, there is been significant progress in another category of neural architectures—Transformers. These models have demonstrated remarkable performance in natural language tasks and higher‐level vision applications. Despite their ability to address some of CNNs limitations, such as restricted receptive fields and adaptability issues, Transformer models often face difficulties when processing images with a high level of detail. This is because the complexity of the computations required increases significantly with the image's spatial resolution. As a result, their application to most high‐resolution image restoration tasks becomes impractical. In our research, we introduce a novel Transformer model, named DehFormer, by implementing specific design modifications in its fundamental components, for example, the multi‐head attention and feed‐forward network. Specifically, the proposed architecture consists of the three modules, that is, (a) multi‐scale feature aggregation network (MSFAN), (b) the gated‐Dconv feed‐forward network (GFFN), (c) and the multi‐Dconv head transposed attention (MDHTA). For the MDHTA module, our objective is to scrutinize the mechanics of scaled dot‐product attention through the utilization of per‐element product operations, thereby bypassing the need for matrix multiplications and operating directly in the frequency domain for enhanced efficiency. For the GFFN module, which enables only the relevant and valuable information to advance through the network hierarchy, thereby enhancing the efficiency of information flow within the model. Extensive experiments are conducted on the SateHazelk, RS‐Haze, and RSID datasets, resulting in performance that significantly exceeds that of existing methods.

Improved Transformer for High-Resolution GANs

TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up

SpatialGAN: Progressive Image Generation Based on Spatial Recursive Adversarial Expansion

IG-CFAT: An Improved GAN-Based Framework for Effectively Exploiting Transformers in Real-World Image Super-Resolution

Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution

Rethinking low-light enhancement via Transformer-GAN

Taming Transformers for High-Resolution Image Synthesis

The Nuts and Bolts of Adopting Transformer in GANs

ViTGAN: Training GANs with Vision Transformers

A Novel Generator with Auxiliary Branch for Improving GAN Performance

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Combining Transformer Generators with Convolutional Discriminators

Rethinking Attention Mechanisms in Vision Transformers with Graph Structures

Graph Transformer GANs for Graph-Constrained House Generation

HRGAN: A Generative Adversarial Network Producing Higher-Resolution Images than Training Sets

SRTransGAN: Image Super-Resolution using Transformer based Generative Adversarial Network

An Acceleration Framework for High Resolution Image Synthesis

An efficient multi‐scale transformer for satellite image dehazing