Slimmable Video Codec

Zhaocheng Liu,Luis Herranz,Fei Yang,Saiping Zhang,Shuai Wan,Marta Mrak,Marc Górriz Blanch
DOI: https://doi.org/10.48550/arXiv.2205.06754
2022-05-14
Abstract:Neural video compression has emerged as a novel paradigm combining trainable multilayer neural networks and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands. In addition, models are usually optimized for a single RD tradeoff. Recent slimmable image codecs can dynamically adjust their model capacity to gracefully reduce the memory and computation requirements, without harming RD performance. In this paper we propose a slimmable video codec (SlimVC), by integrating a slimmable temporal entropy model in a slimmable autoencoder. Despite a significantly more complex architecture, we show that slimming remains a powerful mechanism to control rate, memory footprint, computational cost and latency, all being important requirements for practical video compression.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve a lightweight and flexible design in neural network video compression to meet the challenges in practical applications. Specifically, although current neural video codecs are competitive in rate - distortion (RD) performance, their complex architectures, high memory and computational requirements make them impractical in practical applications. In addition, these models are usually optimized for a single rate - distortion trade - off and lack flexibility. To solve these problems, the paper proposes an adjustable video codec (Slimmable Video Codec, SlimVC). By integrating an adjustable temporal entropy model into an adjustable autoencoder, it achieves effective control over rate, memory footprint, computational cost, and latency. The design goal of SlimVC is to be able to dynamically adjust the model capacity according to different application scenarios while maintaining high rate - distortion performance, thereby meeting the requirements of different devices and network environments. ### Main Contributions 1. **Proposed a brand - new adjustable video codec (SlimVC)**: SlimVC is based on an adjustable module design and can significantly save memory and computational costs at low - to - medium bitrates, and can achieve variable - rate control through a single model. 2. **Integrated an adjustable temporal entropy model**: Compared with traditional video codecs, SlimVC effectively utilizes temporal redundancy and improves compression efficiency by adding an adjustable temporal entropy model to the autoencoder. 3. **Experimentally verified the effectiveness of SlimVC**: The experimental results show that SlimVC's rate - distortion performance is close to that of independently - trained video codecs, and it performs excellently in terms of computational and memory efficiency. In particular, it can achieve a speed - up of up to 20 times at low bitrates. ### Technical Details - **SlimCAE**: The slimmable compressive autoencoder (SlimCAE) is an autoencoder that can dynamically adjust the model capacity, reducing memory and computational requirements while maintaining rate - distortion performance. - **STEM**: The spatiotemporal entropy model (STEM) is a video compression method without motion estimation. It improves compression efficiency by directly utilizing temporal redundancy in the entropy model. - **SlimVC Framework**: SlimVC combines the advantages of SlimCAE and STEM and designs a fully adjustable framework, including feature autoencoders (SlimFE, SlimFD) and entropy models (SlimHE, SlimHD, SlimTPM, SlimEPM). ### Experimental Results - **Rate - Distortion Performance**: SlimVC's rate - distortion performance on the HEVC Class B and UVG datasets is close to that of independently - trained video codecs, indicating that it maintains high compression efficiency while providing variable - rate control. - **Computational and Memory Efficiency**: When processing 1080P video sequences, SlimVC's computational cost is significantly lower than other baseline methods. Especially at low bitrates, it can achieve a very significant speed - up (up to 20 times). In addition, SlimVC's memory footprint can be flexibly adjusted according to bitrate requirements, showing its advantages in practical applications. In conclusion, through proposing SlimVC, this paper aims to solve the limitations of existing neural video codecs in practical applications and provides a lightweight and flexible solution.