Abstract:Existing Video Frame interpolation (VFI) models tend to suffer from time-to-location ambiguity when trained with video of non-uniform motions, such as accelerating, decelerating, and changing directions, which often yield blurred interpolated frames. In this paper, we propose (i) a novel motion description map, Bidirectional Motion field (BiM), to effectively describe non-uniform motions; (ii) a BiM-guided Flow Net (BiMFN) with Content-Aware Upsampling Network (CAUN) for precise optical flow estimation; and (iii) Knowledge Distillation for VFI-centric Flow supervision (KDVCF) to supervise the motion estimation of VFI model with VFI-centric teacher flows. The proposed VFI is called a Bidirectional Motion field-guided VFI (BiM-VFI) model. Extensive experiments show that our BiM-VFI model significantly surpasses the recent state-of-the-art VFI methods by 26% and 45% improvements in LPIPS and STLPIPS respectively, yielding interpolated frames with much fewer blurs at arbitrary time instances.
What problem does this paper attempt to address?
This paper attempts to solve the Time - to - Location (TTL) ambiguity problem in the Video Frame Interpolation (VFI) task, especially when dealing with videos of non - uniform motion (such as acceleration, deceleration, and direction change). Existing methods are easily affected by this ambiguity during the training process, resulting in severe blurring in the interpolated frames. To this end, the author proposes a novel Bidirectional Motion Field (BiM) descriptor and a BiM - guided Flow Network (BiMFN), and supervises the motion estimation of the VFI model through a Knowledge Distillation strategy (KDVCF).
### Specific problems and solutions:
1. **TTL ambiguity problem**:
- **Problem**: For video sequences with non - uniform motion, there are infinite possible trajectories of corresponding pixels between two source frames, which makes it complicated to predict the actual target frame, especially during the inference stage.
- **Solution**: By introducing the Bidirectional Motion Field (BiM) to describe non - uniform motion, including acceleration, deceleration, and direction change, and using BiM for VFI learning to limit the solution space of possible motion trajectories.
2. **Inaccurate motion estimation**:
- **Problem**: Existing VFI models have difficulty in accurately estimating motion when dealing with non - uniform motion due to the TTL ambiguity problem.
- **Solution**: Design the BiM - guided FlowNet (BiMFN) and the Content - Aware Upsampling Network (CAUN) to accurately estimate the bidirectional optical flow.
3. **Insufficient training supervision**:
- **Problem**: Using only the target time step for VFI learning will cause the model to learn the average result of all possibilities, resulting in blurry interpolated frames.
- **Solution**: Propose a Knowledge Distillation strategy (KDVCF) for flow supervision centered on VFI, using the teacher process to generate more accurate BiM and flow to supervise the learning of the student process.
### Main contributions:
- Propose a new motion descriptor - Bidirectional Motion Field (BiM), which can effectively describe non - uniform motion.
- Introduce a Knowledge Distillation strategy (KDVCF) to achieve direct supervision of optical flow estimation and photometric reconstruction.
- Design the BiM - guided FlowNet (BiMFN) and the Content - Aware Upsampling Network (CAUN) to accurately estimate the optical flow.
- Experimental results show that the BiM - VFI model improves by 26% and 45% respectively on the LPIPS and STLPIPS metrics compared to the existing state - of - the - art methods, significantly reducing the blurriness of the interpolated frames.
### Summary:
This paper solves the TTL ambiguity problem brought by non - uniform motion in video frame interpolation through innovative methods such as BiM, BiMFN, and KDVCF, significantly improving the quality of the interpolated frames.