Abstract:Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. These issues can be traced back to two pivotal factors: unavoidable motion errors and misalignment in supervision. In practice, motion estimates often prove to be error-prone, resulting in misaligned features. Furthermore, the reconstruction loss tends to bring blurry results, particularly in misaligned regions. To mitigate these challenges, we propose a new paradigm called PerVFI (Perception-oriented Video Frame Interpolation). Our approach incorporates an Asymmetric Synergistic Blending module (ASB) that utilizes features from both sides to synergistically blend intermediate features. One reference frame emphasizes primary content, while the other contributes complementary information. To impose a stringent constraint on the blending process, we introduce a self-learned sparse quasi-binary mask which effectively mitigates ghosting and blur artifacts in the output. Additionally, we employ a normalizing flow-based generator and utilize the negative log-likelihood loss to learn the conditional distribution of the output, which further facilitates the generation of clear and fine details. Experimental results validate the superiority of PerVFI, demonstrating significant improvements in perceptual quality compared to existing methods. Codes are available at \url{

What problem does this paper attempt to address?

This paper attempts to solve the common blurring and ghosting problems in the video frame interpolation (VFI) task. These problems mainly stem from two factors: inevitable motion errors and temporal supervision misalignment. Specifically: 1. **Inevitable motion errors**: Ideally, satisfactory results can be obtained through accurate motion estimation. However, in practical applications, especially when dealing with large - scale motion, it is very difficult to achieve error - free pixel - level correspondence. This leads to inaccurate feature alignment, which in turn affects the quality of the finally generated intermediate frames. 2. **Temporal supervision misalignment**: During the training phase, the ground - truth (GT) intermediate frames only provide references at specific time points. But in natural videos, there may be multiple potential solutions within the time interval between two frames. Therefore, the intermediate features learned from different training videos may be different, resulting in the network generating blurry results. To solve the above problems, the authors propose a new perception - oriented video frame interpolation method (PerVFI). The main innovations of PerVFI include: - **Asymmetric Synergistic Blending module (ASB)**: Utilize features from both sides for synergistic blending, where one reference frame emphasizes the main content and the other reference frame provides supplementary information. To strictly control the fusion process, a self - learning sparse quasi - binary mask is introduced, which effectively reduces ghosting and blurring artifacts in the output. - **Normalized - flow - based generator**: Use a normalized - flow - based generator to decode intermediate features. This generator models the conditional distribution of the output based on the reference input, further promoting the generation of clear details. Compared with GAN - based methods and diffusion - based methods, the normalized - flow - based method is more stable during the training process and has lower latency during inference. The experimental results verify the significant advantages of PerVFI in perceptual quality. In particular, when dealing with large - scale motion and temporal supervision misalignment, the generated intermediate frames have higher visual quality.

Perception-Oriented Video Frame Interpolation via Asymmetric Blending

Multiframe Interpolation for Video Using Phase Features

Video Frame Interpolation via Structure-Motion based Iterative Fusion

Motion-Aware Video Frame Interpolation

Video Frame Interpolation with Densely Queried Bilateral Correlation

Error-Aware Spatial Ensembles for Video Frame Interpolation

Video Frame Interpolation without Temporal Priors

13‐3: Invited Paper: Video Frame Interpolation Via Structure Motion Based Iterative Feature Fusion

Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation

Enhanced spatial-temporal freedom for video frame interpolation

SVFI: Spiking-Based Video Frame Interpolation for High-Speed Motion.

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation

Video Frame Interpolation: A Comprehensive Survey

Perceptual Quality Assessment for Video Frame Interpolation

Dynamic Frame Interpolation in Wavelet Domain

Efficiently Exploiting Spatially Variant Knowledge for Video Deblurring

IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame Interpolation with Events

Blurry Video Frame Interpolation

Progressive Spatial-temporal Collaborative Network for Video Frame Interpolation

Three-Stage Cascade Framework for Blurry Video Frame Interpolation

Frame Interpolation with Consecutive Brownian Bridge Diffusion