Abstract:Recently, learning-based image compression model has attracted much attention due to its impressive performance and ease of optimization, compared with traditional DCT and wavelet-based image compression standards. Most learning-based image compression models are trained to minimize joint rate-distortion (RD) loss on one single RD trade-off point. However, in many multimedia applications, due to communication constraints, or display adaptation needs for different spatial formats, bit rates or power, it is necessary to provide a variety of image versions for different client devices. To fulfill this requirement, typical end-to-end image compression methods have to compress an image into several bit streams independently by a number of pre-trained networks, which are resource-consuming because of redundancy among these streams. To address this problem, inspired by traditional scalable video coding framework, we propose a learning-based end-to-end quality and spatial scalable image compression (QSSIC) model in multi-layer structure, in which each layer could generate one bitstream corresponding to a specified resolution and image fidelity. This scalability is achieved by exploring the potential of feature-domain representation prediction and reuse. To be specific, firstly, bitstreams of previous layers are used to predict the current layer representations which contains the enhancement information, and then only prediction residuals need to be coded in enhancement layers. Secondly, previous bitstreams are reused in image reconstruction in higher layers to provide basic information. The proposed model could be optimized in an end-to-end manner. Extensive experiments show that our method outperforms state-of-art deep neural networks (DNN)-based auto-encoders in simulcast scenarios. In addition, our method has a better performance than the traditional scalable image compression method scalable extension of H.264/AVC (SVC) and is comparable to scalable extension of H.265/- EVC (SHVC).

Enhanced Screen Content Image Compression: A Synergistic Approach for Structural Fidelity and Text Integrity Preservation

EfficientSCI: Densely Connected Network with Space-time Factorization for Large-scale Video Snapshot Compressive Imaging

Spatial-Temporal Adaptive Compressed Screen Content Video Quality Enhancement

Subjective and Objective Quality Assessment of Compressed Screen Content Images

Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging

Image Segmentation For Improved Lossless Screen Content Compression

Content-aware Facial Image Compression with Deep Learning Method

Perceptual Image Compression with Cooperative Cross-Modal Side Information

Collaborative Scalable Visual Compression for Human-Centered Videos.

Key frames assisted hybrid encoding for photorealistic compressive video sensing

Key Frames Assisted Hybrid Encoding for High-Quality Compressive Video Sensing

Memory-Efficient Network for Large-scale Video Compressive Sensing

Multi-Task Learning for Screen Content Image Coding

Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity

Asymmetric Learned Image Compression with Multi-Scale Residual Block, Importance Scaling, and Post-Quantization Filtering

Learning-Based Scalable Image Compression With Latent-Feature Reuse and Prediction

SSSIC: Semantics-to-Signal Scalable Image Coding with Learned Structural Representations.

ITSRN++: Stronger and Better Implicit Transformer Network for Continuous Screen Content Image Super-Resolution

Deep Motion Regularizer for Video Snapshot Compressive Imaging

Rethinking Semantic Image Compression: Scalable Representation with Cross-modality Transfer

Semantic-assisted image compression