Abstract:Although deep convolutional neural networks have achieved remarkable success in removing synthetic fog, it is essential to be able to process images taken in complex foggy conditions, such as dense or non-homogeneous fog, in the real world. However, the haze distribution in the real world is complex, and downsampling can lead to color distortion or loss of detail in the output results as the resolution of a feature map or image resolution decreases. In addition to the challenges of obtaining sufficient training data, overfitting can also arise in deep learning techniques for foggy image processing, which can limit the generalization abilities of the model, posing challenges for its practical applications in real-world scenarios. Considering these issues, this paper proposes a Transformer-based wavelet network (WaveletFormerNet) for real-world foggy image recovery. We embed the discrete wavelet transform into the Vision Transformer by proposing the WaveletFormer and IWaveletFormer blocks, aiming to alleviate texture detail loss and color distortion in the image due to downsampling. We introduce parallel convolution in the Transformer block, which allows for the capture of multi-frequency information in a lightweight mechanism. Additionally, we have implemented a feature aggregation module (FAM) to maintain image resolution and enhance the feature extraction capacity of our model, further contributing to its impressive performance in real-world foggy image recovery tasks. Extensive experiments demonstrate that our WaveletFormerNet performs better than state-of-the-art methods, as shown through quantitative and qualitative evaluations of minor model complexity. Additionally, our satisfactory results on real-world dust removal and application tests showcase the superior generalization ability and improved performance of WaveletFormerNet in computer vision-related applications.

WaveFormer: Wavelet Transformer for Noise-Robust Video Inpainting

WTVI: A Wavelet-Based Transformer Network for Video Inpainting

WaveFill: A Wavelet-based Generation Network for Image Inpainting

Deep Transformer Based Video Inpainting Using Fast Fourier Tokenization

DeViT: Deformed Vision Transformers in Video Inpainting

ProPainter: Improving Propagation and Transformer for Video Inpainting

Wavelet Prior Attention Learning in Axial Inpainting Network

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention

Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection

Sparse self-attention transformer for image inpainting

Learning Joint Spatial-Temporal Transformations for Video Inpainting

WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

FSTT: Flow-Guided Spatial Temporal Transformer for Deep Video Inpainting

Decoupled Spatial-Temporal Transformer for Video Inpainting

Delving Globally into Texture and Structure for Image Inpainting

WaveletFormerNet: A Transformer-based Wavelet Network for Real-world Non-homogeneous and Dense Fog Removal

Flow-Guided Transformer for Video Inpainting

Towards Online Real-Time Memory-based Video Inpainting Transformers