Abstract:Recently, learning-based image compression model has attracted much attention due to its impressive performance and ease of optimization, compared with traditional DCT and wavelet-based image compression standards. Most learning-based image compression models are trained to minimize joint rate-distortion (RD) loss on one single RD trade-off point. However, in many multimedia applications, due to communication constraints, or display adaptation needs for different spatial formats, bit rates or power, it is necessary to provide a variety of image versions for different client devices. To fulfill this requirement, typical end-to-end image compression methods have to compress an image into several bit streams independently by a number of pre-trained networks, which are resource-consuming because of redundancy among these streams. To address this problem, inspired by traditional scalable video coding framework, we propose a learning-based end-to-end quality and spatial scalable image compression (QSSIC) model in multi-layer structure, in which each layer could generate one bitstream corresponding to a specified resolution and image fidelity. This scalability is achieved by exploring the potential of feature-domain representation prediction and reuse. To be specific, firstly, bitstreams of previous layers are used to predict the current layer representations which contains the enhancement information, and then only prediction residuals need to be coded in enhancement layers. Secondly, previous bitstreams are reused in image reconstruction in higher layers to provide basic information. The proposed model could be optimized in an end-to-end manner. Extensive experiments show that our method outperforms state-of-art deep neural networks (DNN)-based auto-encoders in simulcast scenarios. In addition, our method has a better performance than the traditional scalable image compression method scalable extension of H.264/AVC (SVC) and is comparable to scalable extension of H.265/- EVC (SHVC).

Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

Task-Driven Semantic Coding via Reinforcement Learning

High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

High Efficiency Deep-learning Based Video Compression

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

Robust moving object segmentation in the compressed domain for H.264/AVC video stream

A Coding Framework and Benchmark towards Low-Bitrate Video Understanding

Deep Hierarchical Video Compression

LSSVC: A Learned Spatially Scalable Video Coding Scheme

Collaborative Scalable Visual Compression for Human-Centered Videos.

Semantic Neural Rendering-based Video Coding: Towards Ultra-Low Bitrate Video Conferencing

Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity

Semantically Video Coding: Instill Static-Dynamic Clues into Structured Bitstream for AI Tasks

Beyond VVC: Towards Perceptual Quality Optimized Video Compression Using Multi-Scale Hybrid Approaches.

Learning-Based Scalable Image Compression With Latent-Feature Reuse and Prediction

Hierarchical B-frame Video Coding for Long Group of Pictures

A Compressive Prior Guided Mask Predictive Coding Approach for Video Analysis.

A Neural-network Enhanced Video Coding Framework beyond ECM

Deep Learning-Based CTU Splitting and Dynamic GOP Adjusting for Semantic Aware Video Compression

Task-Aware Encoder Control for Deep Video Compression