U²-Former: Nested U-Shaped Transformer for Image Restoration Via Multi-View Contrastive Learning

Xin Feng,Haobo Ji,Wenjie Pei,Jinxing Li,Guangming Lu,David Zhang
DOI: https://doi.org/10.1109/tcsvt.2023.3286405
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:While Transformer has achieved remarkable performance in various high-level vision tasks, it is still challenging to exploit the full potential of Transformer in image restoration. The crux lies in the limited depth of applying Transformer in the typical encoder-decoder framework for image restoration, resulting from heavy self-attention computation load and inefficient communications across different depth (scales) of layers. In this paper, we present a deep and effective Transformer-based network for image restoration, termed as U 2 -Former, which is able to employ self-attention of Transformer as the core operation for feature learning to perform image restoration in a deep encoding and decoding space. Specifically, it leverages the nested U-shaped structure to facilitate the interactions across different layers with different scales of feature maps. Furthermore, we optimize the computational efficiency for the basic Transformer block by introducing a simple yet effective feature-filtering mechanism to compress the token representation. Apart from the typical supervision ways for image restoration, our U 2 -Former also performs multi-view contrastive learning, which constructs positive pairs in various aspects, to learn noise-sensitive but content-irrelevant features and further decouple the noise component from the background image. Extensive experiments on various image restoration tasks, including reflection removal, rain streak removal and dehazing respectively, demonstrate the effectiveness of the proposed U 2 -Former.
What problem does this paper attempt to address?