Frequency domain-enhanced transformer for single image deraining

Mingwen Shao,Zhiyuan Bao,Weihan Liu,Yuanjian Qiao,Yecong Wan
DOI: https://doi.org/10.1007/s00371-023-03252-8
IF: 2.835
2024-02-15
The Visual Computer
Abstract:Since Transformers show a strong capability of building long-range dependencies, the relevant methods are extensively employed for image deraining tasks. However, the intrinsic limitations of Transformers, including costly computational complexity and insufficient ability to capture high-frequency components of the image, hinder the the utilization of Transformers in high-resolution images and lead to the unsatisfactory recovery of local edges and textures. To overcome these limitations, we propose an simple but effective Frequency Domain Enhanced Transformer (FDEFormer) for the image deraining. Firstly, drawing inspiration from the convolution theorem, we devise an efficient approach called frequency domain enhanced multi-head self-attention. The proposed approach replaces traditional matrix multiplication with element-wise product operations in the frequency domain, leading to a substantial reduction in computational complexity. Secondly, the existing spatial domain Transformers-based methods only focus on low-frequency features but pay less attention to high-frequency components, which can adversely affect the quality of the reconstructed images. Therefore, to integrate the content of different frequency levels, we propose a dual domain-complemented feed-forward network. Besides, we further present an attention feature fusion module to facilitate a more effective fusion of features across different layers. Extensive experiments on several datasets demonstrate that our FDEFormer performs favorably against state-of-the-art methods while taking acceptable computational costs. The source code and pre-trained models are available at https://github.com/bobozy1999/FDEFormer.
computer science, software engineering
What problem does this paper attempt to address?