Abstract:Neural video compression has emerged as a novel paradigm combining trainable multilayer neural networks and machine learning, achieving competitive rate-distortion (RD) performances, but still remaining impractical due to heavy neural architectures, with large memory and computational demands. In addition, models are usually optimized for a single RD tradeoff. Recent slimmable image codecs can dynamically adjust their model capacity to gracefully reduce the memory and computation requirements, without harming RD performance. In this paper we propose a slimmable video codec (SlimVC), by integrating a slimmable temporal entropy model in a slimmable autoencoder. Despite a significantly more complex architecture, we show that slimming remains a powerful mechanism to control rate, memory footprint, computational cost and latency, all being important requirements for practical video compression.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve a lightweight and flexible design in neural network video compression to meet the challenges in practical applications. Specifically, although current neural video codecs are competitive in rate - distortion (RD) performance, their complex architectures, high memory and computational requirements make them impractical in practical applications. In addition, these models are usually optimized for a single rate - distortion trade - off and lack flexibility. To solve these problems, the paper proposes an adjustable video codec (Slimmable Video Codec, SlimVC). By integrating an adjustable temporal entropy model into an adjustable autoencoder, it achieves effective control over rate, memory footprint, computational cost, and latency. The design goal of SlimVC is to be able to dynamically adjust the model capacity according to different application scenarios while maintaining high rate - distortion performance, thereby meeting the requirements of different devices and network environments. ### Main Contributions 1. **Proposed a brand - new adjustable video codec (SlimVC)**: SlimVC is based on an adjustable module design and can significantly save memory and computational costs at low - to - medium bitrates, and can achieve variable - rate control through a single model. 2. **Integrated an adjustable temporal entropy model**: Compared with traditional video codecs, SlimVC effectively utilizes temporal redundancy and improves compression efficiency by adding an adjustable temporal entropy model to the autoencoder. 3. **Experimentally verified the effectiveness of SlimVC**: The experimental results show that SlimVC's rate - distortion performance is close to that of independently - trained video codecs, and it performs excellently in terms of computational and memory efficiency. In particular, it can achieve a speed - up of up to 20 times at low bitrates. ### Technical Details - **SlimCAE**: The slimmable compressive autoencoder (SlimCAE) is an autoencoder that can dynamically adjust the model capacity, reducing memory and computational requirements while maintaining rate - distortion performance. - **STEM**: The spatiotemporal entropy model (STEM) is a video compression method without motion estimation. It improves compression efficiency by directly utilizing temporal redundancy in the entropy model. - **SlimVC Framework**: SlimVC combines the advantages of SlimCAE and STEM and designs a fully adjustable framework, including feature autoencoders (SlimFE, SlimFD) and entropy models (SlimHE, SlimHD, SlimTPM, SlimEPM). ### Experimental Results - **Rate - Distortion Performance**: SlimVC's rate - distortion performance on the HEVC Class B and UVG datasets is close to that of independently - trained video codecs, indicating that it maintains high compression efficiency while providing variable - rate control. - **Computational and Memory Efficiency**: When processing 1080P video sequences, SlimVC's computational cost is significantly lower than other baseline methods. Especially at low bitrates, it can achieve a very significant speed - up (up to 20 times). In addition, SlimVC's memory footprint can be flexibly adjusted according to bitrate requirements, showing its advantages in practical applications. In conclusion, through proposing SlimVC, this paper aims to solve the limitations of existing neural video codecs in practical applications and provides a lightweight and flexible solution.

Slimmable Video Codec

Standard compliant video coding using low complexity, switchable neural wrappers

Semantic Neural Rendering-based Video Coding: Towards Ultra-Low Bitrate Video Conferencing

High Efficiency Deep-learning Based Video Compression

Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics

Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation

Accelerating Learned Video Compression via Low-Resolution Representation Learning

Cool-chic video: Learned video coding with 800 parameters

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information

Deep Video Codec Control for Vision Models

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision

Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers

A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network

Low-complexity Overfitted Neural Image Codec

Neural Video Compression with Feature Modulation

Deep Generative Video Compression

A Coding Framework and Benchmark towards Low-Bitrate Video Understanding

Beyond VVC: Towards Perceptual Quality Optimized Video Compression Using Multi-Scale Hybrid Approaches.

High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

C3: High-performance and low-complexity neural compression from a single image or video