Abstract:Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered image at the same time. In this study, drawing inspiration from the research of transformer properties, we introduce the pretrained transformers to address this problem. In particular, we leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics. The pretrained transformer can capture the global topological relations (i.e., self-similarity) of image, and we observe that the captured topological relationships about the sharp image will change when blur occurs. By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information, which is critical in measuring the sharpness of the deblurred image. On the basis of the advantages, we present two types of novel perceptual losses to guide image deblurring. One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space. The other type considers the features extracted from an image as a distribution and compares the distribution discrepancy between recovered image and target one. We demonstrate the effectiveness of transformer properties in improving the perceptual quality while not sacrificing the quantitative scores peak signal-to-noise ratio (PSNR) over the most competitive models, such as Uformer, Restormer, and NAFNet, on defocus deblurring and motion deblurring tasks. The code is available at https://github. com/erfect2020/TransformerPerceptualLoss.

Decoupling Image Deblurring Into Twofold: A Hierarchical Model for Defocus Deblurring

High Quality Image-Pair-based Deblurring Method Using Edge Mask and Improved Residual Deconvolution

Guided Image Deblurring by Deep Multi-Modal Image Fusion.

Single-image Defocus Deblurring by Integration of Defocus Map Prediction Tracing the Inverse Problem Computation

Image Deblurring With Image Blurring

Image Deblurring by Exploring In-Depth Properties of Transformer

Efficiently Exploiting Spatially Variant Knowledge for Video Deblurring

Bidirectional Transformer for Video Deblurring

Efficient Fusion of Depth Information for Defocus Deblurring.

An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network.

Multi-channel Residual Network Model for Accurate Estimation of Spatially-Varying and Depth-Dependent Defocus Kernels.

VDTR: Video Deblurring with Transformer

DeFusionNET: Defocus Blur Detection via Recurrently Fusing and Refining Discriminative Multi-Scale Deep Features

SIDGAN: Efficient Multi-Module Architecture for Single Image Defocus Deblurring

Deblurring Videos Using Spatial-Temporal Contextual Transformer With Feature Propagation

An image deblurring method using improved U-Net model based on multilayer fusion and attention mechanism

Learning Dual-Pixel Alignment for Defocus Deblurring

Rethinking Blur Synthesis for Deep Real-World Image Deblurring

Broad Spectrum Image Deblurring via an Adaptive Super-Network

A Deep Convolutional Encoder–Decoder–Restorer Architecture for Image Deblurring

Defocus Image Deblurring Network With Defocus Map Estimation as Auxiliary Task