Abstract:Recently, deep learning-based image compression has made significant progresses, and has achieved better rate-distortion (R-D) performance than the latest traditional method, H.266/VVC, in both MS-SSIM metric and the more challenging PSNR metric. However, a major problem is that the complexities of many leading learned schemes are too high. In this paper, we propose an efficient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art. First, we develop an improved multi-scale residual block (MSRB) that can expand the receptive field and capture global information more efficiently, which further reduces the spatial correlation of the latent representations. Second, an importance scaling network is introduced to directly scale the latents to achieve content-adaptive bit allocation without sending side information, which is more flexible than previous importance map methods. Third, we apply a post-quantization filter (PQF) to reduce the quantization error, motivated by the Sample Adaptive Offset (SAO) filter in video coding. Moreover, our experiments show that the performance of the system is less sensitive to the complexity of the decoder. Therefore, we design an asymmetric paradigm, in which the encoder employs three stages of MSRBs to improve the learning capacity, whereas the decoder only uses one stage of MSRB, which reduces the decoder complexity and still yields satisfactory performance. Experimental results show that compared to the state-of-the-art method, the encoding and decoding time of the proposed method are about 17 times faster, and the R-D performance is only reduced by about 1% on both Kodak and Tecnick-40 datasets, which is still better than H.266/VVC(4:4:4) and other leading learning-based methods. Our source code is publicly available at https://github.com/fengyurenpingsheng.

PFR-VC: Learning-Based Video Compression Framework with Predicted Frame Refinement

Foreground-Background Parallel Compression with Residual Encoding for Surveillance Video

Accelerating Learned Video Compression via Low-Resolution Representation Learning

High Efficiency Deep-learning Based Video Compression

Learning-Based Video Compression Framework With Implicit Spatial Transform for Applications in the Internet of Things

M-LVC: Multiple Frames Prediction for Learned Video Compression

Joint Learned and Traditional Video Compression for P Frame

Temporal context video compression with flow-guided feature prediction

Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement

Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi-Frame Hypothesis

Learned Video Compression with Residual Prediction and Loop Filter

Adaptive Prediction Structure for Learned Video Compression

FVC: An End-to-End Framework Towards Deep Video Compression in Feature Space

Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering

Learning-Based Video Coding with Joint Deep Compression and Enhancement

Improving Learned Video Compression by Exploring Spatial Redundancy

Learned Video Compression with Residual Prediction and Feature-Aided Loop Filter

A Neural-network Enhanced Video Coding Framework beyond ECM

Conditional Entropy Coding for Efficient Video Compression

High Visual-Fidelity Learned Video Compression

Enhanced Motion-Compensated Video Coding with Deep Virtual Reference Frame Generation