Disentangled Visual Representations for Extreme Human Body Video Compression

Ruofan Wang,Qi Mao,Shiqi Wang,Chuanmin Jia,Ronggang Wang,Siwei Ma
DOI: https://doi.org/10.1109/icme52920.2022.9859831
2022-01-01
Abstract:Recent years have witnessed the great promise of deep neural video compression codecs. However, there are still unprecedented challenges ahead when the videos are expected to be encoded with extremely low bitrate. Motivated by recent attempts of layered conceptual image compression, we make the first attempt to leverage the disentangled visual representations for extreme human body video compression. More specifically, to capture the main structure, we adopt the inferred human pose keypoints as the structure code of each frame, thereby deriving the motion information from structure codes of adjacent frames for further compression. To better exploit the texture redundancy, all frames share the same texture codes by incorporating the proposed texture contrastive learning to ensure texture consistency within a video. Two branches are consequently transmitted in a separable manner, and the generator synthesizes the reconstructed video with the combination of all decoded representations at the decoder side. Both qualitative and quantitative experimental results demonstrate that the proposed scheme can produce perceptually pleasing reconstruction results in ultra-low bitrates far below that can be reached by other video codecs.
What problem does this paper attempt to address?