Jerry Liu,Shenlong Wang,Wei-Chiu Ma,Meet Shah,Rui Hu,Pranaab Dhawan,Raquel Urtasun
Abstract:We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs while being much faster and easier to implement. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% bitrate savings without trading off decoding speed. Importantly, we show that our approach outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video, and against all video codecs on lower framerates, while being thousands of times faster in decoding than deep models utilizing an autoregressive entropy model.
Image and Video Processing,Computer Vision and Pattern Recognition,Information Theory
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the efficiency of video compression while maintaining or enhancing the quality of the compressed video. Specifically, the author proposes an efficient video compression framework based on conditional entropy coding, aiming at:
1. **Simplify model complexity**: Compared with previous learning - based video compression methods, this framework does not perform explicit inter - frame transformations (such as motion compensation, frame interpolation, and residual coding). Instead, it assumes that each frame is encoded by an independent state - of - the - art deep - image compressor. This method reduces the complexity of the model, making the implementation simpler and faster.
2. **Improve compression efficiency**: By only focusing on the modeling of conditional entropy between frames, this framework can achieve compression performance comparable to or even better than existing neural video compression works and other video codecs while maintaining the decoding speed. Especially on high - bit - rate UVG videos, this method outperforms H.265 and other deep - learning baselines in the MS - SSIM metric, and has a significant advantage over all video codecs at a low frame rate.
3. **Optimize internal learning**: The author also proposes a new internal - learning extension, which can further optimize the latent code of frames at the inference stage, thus saving about 10% of the bit rate without sacrificing the decoding speed.
### Specific problem description
- **Simplify model complexity**: Traditional video compression methods usually require complex motion estimation and compensation steps. These steps not only increase the computational complexity but are also difficult to parallelize, resulting in slow encoding and decoding speeds. This paper simplifies the model structure and improves the processing speed by only focusing on the modeling of conditional entropy between frames and avoiding these complex steps.
- **Improve compression efficiency**: Although existing learning - based video compression methods can outperform traditional video codecs in some cases, they are usually not as fast as standard video codecs in encoding and decoding. The method proposed in this paper is not only comparable to existing methods in compression performance but is also thousands of times faster in decoding speed, especially in deep models using autoregressive entropy models.
- **Optimize internal learning**: Traditional video compression methods use a fixed encoder at the inference stage, which results in the latent code not being optimized for a specific test video. By introducing internal learning, this paper enables further optimization of the latent code at the inference stage, further reducing the bit rate without increasing the decoding time.
### Main contributions of the paper
1. **Base model**: Proposes a simple conditional entropy model that is modeled only based on the latent code generated by a deep single - image compressor, reducing the joint bit rate by maximizing the probability of the second - frame image code given the previous frame.
2. **Internal learning**: Introduces internal learning at the inference stage, further reducing the bit rate by optimizing the latent code of each frame without affecting the decoding speed.
Through these innovations, this paper provides an efficient and practical video compression solution applicable to various video contents and scenarios.