Abstract:For the real-time and low-delay video surveillance and teleconferencing applications, the newly video coding standard HEVC can achieve much higher coding efficiency over H.264/AVC. However, we still argue that the hierarchical prediction structure in the HEVC low-delay encoder still does not fully utilize the special characteristics of surveillance and conference videos that are usually captured by stationary cameras. In this case, the background picture (G-picture), which is modeled from the original input frames, can be used to further improve the HEVC low-delay coding efficiency meanwhile reducing the complexity. Therefore, we propose an optimization method for the hierarchical prediction and coding in HEVC for these videos with background modeling. First, several experimental and theoretical analyses are conducted on how to utilize the G-picture to optimize the hierarchical prediction structure and hierarchical quantization. Following these results, we propose to encode the G-picture as the long-term reference frame to improve the background prediction, and then present a G-picture-based bit-allocation algorithm to increase the coding efficiency. Meanwhile, according to the proportions of background and foreground pixels in coding units (CUs), an adaptive speed-up algorithm is developed to classify each CU into different categories and then adopt different speed-up strategies to reduce the encoding complexity. To evaluate the performance, extensive experiments are performed on the HEVC test model. Results show our method can averagely save 39.09% bits and reduce the encoding complexity by 43.63% on surveillance videos, whereas those are 5.27% and 43.68% on conference videos.

Hierarchical Coding for Talking-Head Video

Region-of-Interest Based Conversational HEVC Coding with Hierarchical Perception Model of Face

Beyond Keypoint Coding: Temporal Evolution Inference with Compact Feature Representation for Talking Face Video Compression

DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads

High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

Deep Hierarchical Video Compression

Audio-driven Talking Face Video Generation with Natural Head Pose

Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation

Predictive Coding For Animation-Based Video Compression

Compressing Video Calls using Synthetic Talking Heads

Compact Temporal Trajectory Representation for Talking Face Video Compression

High-Fidelity and Freely Controllable Talking Head Video Generation

Optimizing the hierarchical prediction and coding in HEVC for surveillance and conference videos with background modeling.

Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline

Hierarchical Piece-Wise Linear Projections for Efficient Intra-Prediction Coding.

Hybrid model-and-object-based real-time conversational video coding

Face Region Based Conversational Video Coding

A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing

Towards Coding for Human and Machine Vision: Scalable Face Image Coding

Generating Smooth and Facial-Details-Enhanced Talking Head Video: A Perspective of Pre and Post Processes

Temporal context video compression with flow-guided feature prediction