Abstract:Generative face video coding (GFVC) has been demonstrated as a potential approach to low-latency, low bitrate video conferencing. GFVC frameworks achieve an extreme gain in coding efficiency with over 70% bitrate savings when compared to conventional codecs at bitrates below 10kbps. In recent MPEG/JVET standardization efforts, all the information required to reconstruct video sequences using GFVC frameworks are adopted as part of the supplemental enhancement information (SEI) in existing compression pipelines. In light of this development, we aim to address a challenge that has been weakly addressed in prior GFVC frameworks, i.e., reconstruction drift as the distance between the reference and target frames increases. This challenge creates the need to update the reference buffer more frequently by transmitting more Intra-refresh frames, which are the most expensive element of the GFVC bitstream. To overcome this problem, we propose instead multiple reference animation as a robust approach to minimizing reconstruction drift, especially when used in a bi-directional prediction mode. Further, we propose a contrastive learning formulation for multi-reference animation. We observe that using a contrastive learning framework enhances the representation capabilities of the animation generator. The resulting framework, MRDAC (Multi-Reference Deep Animation Codec) can therefore be used to compress longer sequences with fewer reference frames or achieve a significant gain in reconstruction accuracy at comparable bitrates to previous frameworks. Quantitative and qualitative results show significant coding and reconstruction quality gains compared to previous GFVC methods, and more accurate animation quality in presence of large pose and facial expression changes.

Generative Compression for Face Video: A Hybrid Scheme

A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing

Dynamic Multi-Reference Generative Prediction for Face Video Compression.

Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens

Generative Face Video Coding Techniques and Standardization Efforts: A Review

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

Video Coding Using Learned Latent GAN Compression

Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision

Multi-Reference Generative Face Video Compression with Contrastive Learning

Standardizing Generative Face Video Compression using Supplemental Enhancement Information

A Predictive VQ Based Video Compression Scheme

Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method

Towards Coding for Human and Machine Vision: Scalable Face Image Coding

Interactive Face Video Coding: A Generative Compression Framework

Extreme Generative Human-Oriented Video Coding Via Motion Representation Compression.

Content-aware Facial Image Compression with Deep Learning Method

Predictive Coding For Animation-Based Video Compression

Hybrid model-and-object-based real-time conversational video coding

Semantic Neural Rendering-based Video Coding: Towards Ultra-Low Bitrate Video Conferencing

Deep Video Coding with Dual-Path Generative Adversarial Network

Generative Latent Coding for Ultra-Low Bitrate Image Compression