EVA$^{2}$: Event-Assisted Video Frame Interpolation Via Cross-Modal Alignment and Aggregation

Zeyu Xiao,Wenming Weng,Yueyi Zhang,Zhiwei Xiong
DOI: https://doi.org/10.1109/tci.2022.3228747
IF: 5.4
2022-01-01
IEEE Transactions on Computational Imaging
Abstract:We consider the problem of event-assisted video frame interpolation (VFI), a new track for VFI, by introducing the event data, a novel sensing modality, into the process of generating intermediate frames from low-frame-rate videos. This new track challenges existing methods in two aspects: (1) how to utilize the event data to align boundary keyframes to intermediate ones, especially when there are corruptions in scenes ( e.g. , non-uniform motion, object occlusions, and illumination changes); (2) how to effectively utilize and aggregate cross-modal information for further mitigating corruptions and refining details. In this paper, we propose a novel E vent-assisted V FI method with cross-modal A lignment and A ggregation, termed EVA$^{2}$ , to address these challenges. First, to handle corruptions during alignment, we devise the cross-modal Event-Guided Alignment (EGA) module, in which the intermediate frames are aligned at both the feature and the image levels. The alignment operation in the EGA module is guided by the offset maps generated from the event data and information extracted from the input boundary keyframes. Second, we propose the cross-modal Event-aware Dynamic Aggregation (EDA) module, in which the event-aware dynamic convolution operation is applied to aggregate the event data with the aligned results adaptively for further improvements. Extensive experiments on both synthetic and real-world datasets validate the effectiveness of our EVA $^{2}$ .
What problem does this paper attempt to address?