Abstract:Traditional frame-based video frame interpolation (VFI) methods rely on the linear motion assumption and brightness invariance assumption, which may lead to fatal errors confronting the scenarios with high-speed motions. To tackle the above challenge, inspired by the advantages of event cameras on asynchronously recording brightness changes at each pixel, we propose a Fast-Slow joint synthesis framework for event-enhanced high-speed video frame interpolation, named SuperFast, in this paper, which can generate high frame rate (5000 FPS, 200× faster) video from the input low frame rate (25 FPS) video and the corresponding event stream. In our framework, the task is divided into two sub-tasks, i.e., video frame interpolation for the contents with and without high-speed motions, which are tackled by two corresponding branches, i.e., the fast synthesis pathway and the slow synthesis pathway. The fast synthesis pathway leverages a spiking neural network to encode the input event stream, and combines boundary frames to generate intermediate results through synthesis and refinement, targeting on contents with high-speed motions. The slow synthesis pathway stacks the two input boundary frames and the event stream to synthesize intermediate results, focusing on relatively slow-motion contents. Finally, a fusion module with a comparison loss is utilized to generate the final video frame interpolation results. We also build a hybrid visual acquisition system containing an event camera and a high frame rate camera, and collect the first 5000 FPS High-Speed Event-enhanced Video frame Interpolation (THU[Formula: see text]) dataset. To evaluate the performance of our proposed framework, we have conducted experiments on our THU[Formula: see text] dataset and the existing HS-ERGB dataset. Experimental results demonstrate that our proposed framework can achieve state-of-the-art 200× video frame interpolation performance under high-speed motion scenarios.

Super Fast Event Recognition in Internet Videos

SUPER: towards real-time event recognition in internet videos

A Fast Video Event Recognition System and Its Application to Video Search

High-level Event Recognition in Unconstrained Videos

TEINet: Towards an Efficient Architecture for Video Recognition.

Instantly Telling What Happens in a Video Sequence Using Simple Features

Graph-based Asynchronous Event Processing for Rapid Object Recognition

Action Recognition and Benchmark Using Event Cameras.

Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge Base.

F2D-SIFPNet: a Frequency 2D Slow-I-Fast-P Network for Faster Compressed Video Action Recognition

Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning

Learning Discriminative Features for Fast Frame-Based Action Recognition.

Resource Constrained Multimedia Event Detection

Adaptive Focus for Efficient Video Recognition

SuperFast: 200× Video Frame Interpolation Via Event Camera

Multimodal feature fusion for robust event detection in web videos

FASTER Recurrent Networks for Efficient Video Classification

A Coarse-to-Fine Framework for Resource Efficient Video Recognition.

A Dynamic Frame Selection Framework for Fast Video Recognition.

Multi-object Events Recognition from Video Sequences Using Extended Finite State Machine

SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition