Abstract:Deep learning methods have achieved impressive performance in compressed video quality enhancement tasks. However, these methods rely excessively on practical experience by manually designing the network structure and do not fully exploit the potential of the feature information contained in the video sequences, i.e., not taking full advantage of the multiscale similarity of the compressed artifact information and not seriously considering the impact of the partition boundaries in the compressed video on the overall video quality. In this article, we propose a novel Mixed Difference Equation inspired Transformer (MDEformer) for compressed video quality enhancement, which provides a relatively reliable principle to guide the network design and yields a new insight into the interpretable transformer. Specifically, drawing on the graphical concept of the mixed difference equation (MDE), we utilize multiple cross-layer cross-attention aggregation (CCA) modules to establish long-range dependencies between encoders and decoders of the transformer, where partition boundary smoothing (PBS) modules are inserted as feedforward networks. The CCA module can make full use of the multiscale similarity of compression artifacts to effectively remove compression artifacts, and recover the texture and detail information of the frame. The PBS module leverages the sensitivity of smoothing convolution to partition boundaries to eliminate the impact of partition boundaries on the quality of compressed video and improve its overall quality, while not having too much impacts on non-boundary pixels. Extensive experiments on the MFQE 2.0 dataset demonstrate that the proposed MDEformer can eliminate compression artifacts for improving the quality of the compressed video, and surpasses the state-of-the-arts (SOTAs) in terms of both objective metrics and visual quality.

Transformer-Based Video Deinterlacing Method.

Vision Transformers for Single Image Dehazing

VDTR: Video Deblurring with Transformer

Bidirectional Transformer for Video Deblurring

De-interlacing with weighted edge adaptive intra-field interpolation

An Efficient Dehazing Algorithm Based on the Fusion of Transformer and Convolutional Neural Network.

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

Aggregating Nearest Sharp Features via Hybrid Transformers for Video Deblurring

CTFCD: Channel Transformer Based on Full Convolutional Decoder for Single Image Deraining

Stereoscopic video deblurring transformer

Rethinking deinterlacing for early interlaced videos

Multiframe Joint Enhancement for Early Interlaced Videos

Video Demoiréing with Deep Temporal Color Embedding and Video-Image Invertible Consistency

Transformer-based progressive residual network for single image dehazing

VJT: A Video Transformer on Joint Tasks of Deblurring, Low-light Enhancement and Denoising

MDEformer: Mixed Difference Equation Inspired Transformer for Compressed Video Quality Enhancement.

Multi-frame Joint Enhancement for Early Interlaced Videos

Learning an Occlusion-Aware Network for Video Deblurring

Improved Transformer-Based Deblurring of Commodity Videos in Dynamic Visual Cabinets

Multi-Field De-interlacing using Deformable Convolution Residual Blocks and Self-Attention

Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency