Abstract:Motion model based video coding approach, which employs sparse sets of keypoints instead of dense optical flows, can efficiently compress videos at ultra-low bitrates. Such schemes obtain notable performance gains over traditional video codecs in face-centric scenarios, such as video conferencing. However, due to the high complexity of human poses, there is still a lack of research on motion model based human body video coding, especially in the case of large pose variations. In order to overcome this limitation, we present a thin-plate spline motion model based portrait video compression framework oriented to adaptive pose processing. Firstly a more flexible thin-plate spline transformation rather than simple affine transformation is adopted for motion estimation, since the nonlinear property allows representing more complex motions. Meanwhile, spatial constraints are incorporated into the keypoint detector to generate keypoints that are more consistent with the human poses, thus obtaining more accurate optical flow. In addition, a motion intensity evaluation module is designed at the encoder side to dynamically evaluate the inter-frame motion intensity. Adaptive Reference Frame Selection algorithm is then further devised at the decoder side to adaptively select the reconstruction scheme for different intensities of portrait motion. Finally, a multi-frame reconstruction module is introduced for large pose variations to improve the consistency of human pose and subjective quality. The experimental results demonstrate that compared to the state-of-the-art video coding standard Versatile Video Coding and existing motion model based compression techniques, our proposed scheme can better cope with large pose variation scenarios and outperforms in both objective and subjective quality at the similar bitrate with higher temporal consistency.

Beyond Keypoint Coding: Temporal Evolution Inference with Compact Feature Representation for Talking Face Video Compression

Compact Temporal Trajectory Representation for Talking Face Video Compression

Interactive Face Video Coding: A Generative Compression Framework

Towards Coding for Human and Machine Vision: Scalable Face Image Coding

Predictive Coding For Animation-Based Video Compression

Compressing Video Calls using Synthetic Talking Heads

Temporal context video compression with flow-guided feature prediction

From Visual Search to Video Compression: A Compact Representation Framework for Video Feature Descriptors.

Audio-driven Talking Face Video Generation with Natural Head Pose

Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach

Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization

Dynamic Multi-Reference Generative Prediction for Face Video Compression.

Towards Analysis-Friendly Face Representation with Scalable Feature and Texture Compression

Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens

2D/3D Model-Based Facial Video Coding/Decoding at Ultra-Low Bit-Rate.

Hybrid model-and-object-based real-time conversational video coding

Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency

Collaborative Scalable Visual Compression for Human-Centered Videos.

FVC: An End-to-End Framework Towards Deep Video Compression in Feature Space

Model-based portrait video compression with spatial constraint and adaptive pose processing

Teacher-student learning with multi-granularity constraint towards compact facial feature representation