Abstract:Motion model based video coding approach, which employs sparse sets of keypoints instead of dense optical flows, can efficiently compress videos at ultra-low bitrates. Such schemes obtain notable performance gains over traditional video codecs in face-centric scenarios, such as video conferencing. However, due to the high complexity of human poses, there is still a lack of research on motion model based human body video coding, especially in the case of large pose variations. In order to overcome this limitation, we present a thin-plate spline motion model based portrait video compression framework oriented to adaptive pose processing. Firstly a more flexible thin-plate spline transformation rather than simple affine transformation is adopted for motion estimation, since the nonlinear property allows representing more complex motions. Meanwhile, spatial constraints are incorporated into the keypoint detector to generate keypoints that are more consistent with the human poses, thus obtaining more accurate optical flow. In addition, a motion intensity evaluation module is designed at the encoder side to dynamically evaluate the inter-frame motion intensity. Adaptive Reference Frame Selection algorithm is then further devised at the decoder side to adaptively select the reconstruction scheme for different intensities of portrait motion. Finally, a multi-frame reconstruction module is introduced for large pose variations to improve the consistency of human pose and subjective quality. The experimental results demonstrate that compared to the state-of-the-art video coding standard Versatile Video Coding and existing motion model based compression techniques, our proposed scheme can better cope with large pose variation scenarios and outperforms in both objective and subjective quality at the similar bitrate with higher temporal consistency.

Disentangled Visual Representations for Extreme Human Body Video Compression

Extreme Generative Human-Oriented Video Coding Via Motion Representation Compression.

Foreground-Background Parallel Compression with Residual Encoding for Surveillance Video

Semantic-Aware Visual Decomposition for Image Coding

Conceptual Compression via Deep Structure and Texture Synthesis

Collaborative Scalable Visual Compression for Human-Centered Videos.

From Visual Search to Video Compression: A Compact Representation Framework for Video Feature Descriptors.

Pose-Driven Compression for Dynamic 3D Human Via Human Prior Models.

Model-based portrait video compression with spatial constraint and adaptive pose processing

Joint Feature and Texture Coding: Toward Smart Video Representation Via Front-End Intelligence

HMFVC: A Human-Machine Friendly Video Compression Scheme

Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization

Decomposition, Compression, and Synthesis (DCS)-based Video Coding: A Neural Exploration via Resolution-Adaptive Learning

Extreme Video Compression with Pre-trained Diffusion Models

Ultra-low bitrate video conferencing using deep image animation

A Joint Compression Scheme of Video Feature Descriptors and Visual Content.

Beyond VVC: Towards Perceptual Quality Optimized Video Compression Using Multi-Scale Hybrid Approaches.

Semantic Neural Rendering-based Video Coding: Towards Ultra-Low Bitrate Video Conferencing

Conditional Entropy Coding for Efficient Video Compression

Towards Analysis-Friendly Face Representation with Scalable Feature and Texture Compression

Deep Image Compression Toward Machine Vision: A Unified Optimization Framework