Abstract:Frame extrapolation is to predict future frames from the past (reference) frames, which has been studied intensively in the computer vision research and has great potential in video coding. Recently, a number of studies have been devoted to the use of deep networks for frame extrapolation, which achieves certain success. However, due to the complex and diverse motion patterns in natural video, it is still difficult to extrapolate frames with high fidelity directly from reference frames. To address this problem, we introduce reference frame alignment as a key technique for deep network-based frame extrapolation. We propose to align the reference frames, e.g. using block-based motion estimation and motion compensation, and then to extrapolate from the aligned frames by a trained deep network. Since the alignment, a preprocessing step, effectively reduces the diversity of network input, we observe that the network is easier to train and the extrapolated frames are of higher quality. We verify the proposed technique in video coding, using the extrapolated frame for inter prediction in High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). We investigate different schemes, including whether to align between the target frame and the reference frames, and whether to perform motion estimation on the extrapolated frame. We conduct a comprehensive set of experiments to study the efficiency of the proposed method and to compare different schemes. Experimental results show that our proposal achieves on average 5.3% and 2.8% BD-rate reduction in Y component compared to HEVC, under low-delay P and low-delay B configurations, respectively. Our proposal performs much better than the frame extrapolation without reference frame alignment.

Generative Adversarial Network-Based Frame Extrapolation for Video Coding

Deep Network-Based Frame Extrapolation with Reference Frame Alignment

Deep Reference Frame Generation Method for VVC Inter Prediction Enhancement

Deep Frame Prediction for Video Coding

CNN-Based Bi-Directional Motion Compensation for High Efficiency Video Coding.

M-LVC: Multiple Frames Prediction for Learned Video Compression

Deep Reference Frame Interpolation Based Inter Prediction Enhancement for Versatile Video Coding

Sequential Enhancement for Compressed Video Using Deep Convolutional Generative Adversarial Network

Towards Lightweight Deep Reference Frame for Versatile Video Coding

Extreme Generative Human-Oriented Video Coding Via Motion Representation Compression.

An Adaptive Linear Estimator Based Approach to Bi-Directional Motion Compensated Prediction

DMVC: Decomposed Motion Modeling for Learned Video Compression

Motion Free B-frame Coding for Neural Video Compression

End-to-end Neural Video Coding Using a Compound Spatiotemporal Representation

A Robust Quality Enhancement Method Based on Joint Spatial-Temporal Priors for Video Coding

Deep Video Coding with Dual-Path Generative Adversarial Network

Generalized In-Scale Motion Compensation Framework for Spatial Scalable Video Coding.

Deep Motion Vector Prediction for Versatile Video Coding

Video Frame Synthesis Via Plug-and-Play Deep Locally Temporal Embedding.

Multi-Hypothesis Prediction Based on Implicit Motion Vector Derivation for Video Coding.

Deep Learned Frame Prediction for Video Compression