Abstract:Motion model based video coding approach, which employs sparse sets of keypoints instead of dense optical flows, can efficiently compress videos at ultra-low bitrates. Such schemes obtain notable performance gains over traditional video codecs in face-centric scenarios, such as video conferencing. However, due to the high complexity of human poses, there is still a lack of research on motion model based human body video coding, especially in the case of large pose variations. In order to overcome this limitation, we present a thin-plate spline motion model based portrait video compression framework oriented to adaptive pose processing. Firstly a more flexible thin-plate spline transformation rather than simple affine transformation is adopted for motion estimation, since the nonlinear property allows representing more complex motions. Meanwhile, spatial constraints are incorporated into the keypoint detector to generate keypoints that are more consistent with the human poses, thus obtaining more accurate optical flow. In addition, a motion intensity evaluation module is designed at the encoder side to dynamically evaluate the inter-frame motion intensity. Adaptive Reference Frame Selection algorithm is then further devised at the decoder side to adaptively select the reconstruction scheme for different intensities of portrait motion. Finally, a multi-frame reconstruction module is introduced for large pose variations to improve the consistency of human pose and subjective quality. The experimental results demonstrate that compared to the state-of-the-art video coding standard Versatile Video Coding and existing motion model based compression techniques, our proposed scheme can better cope with large pose variation scenarios and outperforms in both objective and subjective quality at the similar bitrate with higher temporal consistency.

Dynamic Multi-Reference Generative Prediction for Face Video Compression.

Multi-Reference Generative Face Video Compression with Contrastive Learning

Beyond Keypoint Coding: Temporal Evolution Inference with Compact Feature Representation for Talking Face Video Compression

Predictive Coding For Animation-Based Video Compression

Extreme Generative Human-Oriented Video Coding Via Motion Representation Compression.

Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization

Deep Generative Video Compression

Compact Temporal Trajectory Representation for Talking Face Video Compression

Audio-driven Talking Face Video Generation with Natural Head Pose

M-LVC: Multiple Frames Prediction for Learned Video Compression

Deep Video Compression with Scaled Hierarchical Bi-directional Motion Model

Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction

Model-based portrait video compression with spatial constraint and adaptive pose processing

Butterfly: Multiple Reference Frames Feature Propagation Mechanism for Neural Video Compression

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

Compressing Scene Dynamics: A Generative Approach

Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision

High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation

Interactive Face Video Coding: A Generative Compression Framework

Generative Face Video Coding Techniques and Standardization Efforts: A Review