Abstract:In this paper, we propose a novel image coding framework with semantic-aware visual decomposition towards extremely low bitrate compression. In particular, an input image is analyzed into a semantic map as structural representation and semantic-wise texture representation and further compressed into bitstreams at the encoder side. On the decoder side, the received bitstreams of dual-layer representations are decoded and reconstructed for target image synthesis with generative models. Moreover, the attention mechanism is introduced into the model architecture for texture representation modeling and a coherency regularization is proposed to further optimize the texture representation space by aligning the representation space with the source pixel space for higher synthesis quality. Besides, we also propose a cross-channel entropy module and control the quantization scale to facilitate rate-distortion optimization. Upon compressing the decomposed components into the bitstream, the simple yet effective representation philosophy benefits image compression in many aspects. First, in terms of compression performance, compact representations, and high visual synthesis quality can bring remarkable advantages. Second, the proposed framework yields a physically explainable bitstream composed of the structural segment and semantic-wise texture segments. Third and most importantly, subsequent vision tasks (e.g., content manipulation) can receive fundamental support from the semantic-aware visual decomposition and synthesis mechanism. Extensive experimental results demonstrate the superiority of the proposed framework towards efficient visual representation learning, high efficiency image compression ( bpp), and intelligent visual applications (e.g., manipulation and analysis).

Perceptual Video Coding Based on Semantic-Guided Texture Detection and Synthesis

Edge-Based Video Compression Texture Synthesis Using Generative Adversarial Network

Video Coding with Spatio-Temporal Texture Synthesis

A Pixel-Level Segmentation-Synthesis Framework for Dynamic Texture Video Compression

Spatiotemporal Generative Adversarial Network-Based Dynamic Texture Synthesis for Surveillance Video Coding

Synthesis-Aware Region-Based 3D Video Coding.

Video coding by texture analysis and synthesis using graph cut

Semantic-Aware Visual Decomposition for Image Coding

Extending HEVC Using Texture Synthesis

Semantically Video Coding: Instill Static-Dynamic Clues into Structured Bitstream for AI Tasks

Semantical Video Coding: Instill Static-Dynamic Clues into Structured Bitstream for AI Tasks

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

From Visual Search to Video Compression: A Compact Representation Framework for Video Feature Descriptors.

SVS-GAN: Leveraging GANs for Semantic Video Synthesis

Surveillance Video Coding with Dynamic Textural Background Detection

Semantic Neural Rendering-based Video Coding: Towards Ultra-Low Bitrate Video Conferencing

Smoothed Reference Inter-Layer Texture Prediction For Bit Depth Scalable Video Coding

Generic video coding with abstraction and detail completion

AV1 Video Coding Using Texture Analysis With Convolutional Neural Networks

Wireless Deep Video Semantic Transmission

Beyond VVC: Towards Perceptual Quality Optimized Video Compression Using Multi-Scale Hybrid Approaches.