ReBotNet: Fast Real-time Video Enhancement

Jeya Maria Jose Valanarasu,Rahul Garg,Andeep Toor,Xin Tong,Weijuan Xi,Andreas Lugmayr,Vishal M. Patel,Anne Menini
2023-03-24
Abstract:Most video restoration networks are slow, have high computational load, and can't be used for real-time video enhancement. In this work, we design an efficient and fast framework to perform real-time video enhancement for practical use-cases like live video calls and video streams. Our proposed method, called Recurrent Bottleneck Mixer Network (ReBotNet), employs a dual-branch framework. The first branch learns spatio-temporal features by tokenizing the input frames along the spatial and temporal dimensions using a ConvNext-based encoder and processing these abstract tokens using a bottleneck mixer. To further improve temporal consistency, the second branch employs a mixer directly on tokens extracted from individual frames. A common decoder then merges the features form the two branches to predict the enhanced frame. In addition, we propose a recurrent training approach where the last frame's prediction is leveraged to efficiently enhance the current frame while improving temporal consistency. To evaluate our method, we curate two new datasets that emulate real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the performance issues of existing video enhancement networks in real - time applications, which specifically include the following aspects: 1. **Slow speed and high computational load**: Most existing video restoration networks have a slow processing speed and a high computational load, and cannot be used for real - time video enhancement, such as real - time video calls and video streaming. 2. **Poor temporal consistency**: Many methods require the input of past frames and future frames, which will introduce latency in streaming videos and affect real - time performance. 3. **Multiple degradation problems**: Videos in real - world scenarios are usually affected by multiple degradation factors (such as noise, blurring, compression artifacts, etc.), while existing video restoration methods are often optimized for only a single type of degradation. To solve these problems, the author proposes a new efficient framework - Recurrent Bottleneck Mixer Network (ReBotNet), aiming to achieve fast real - time video enhancement. ReBotNet solves the above problems in the following ways: - **Dual - branch architecture**: The first branch extracts spatio - temporal features through the ConvNext encoder and processes these features using the bottleneck mixer; the second branch directly mixes the features extracted from a single frame to further improve temporal consistency. - **Recursive training method**: Utilize the prediction result of the previous frame as an additional input to improve the temporal consistency and efficiency of the current frame prediction, while reducing the need for multiple frames as input and reducing the computational complexity. - **New datasets**: To better simulate real - world application scenarios, the author creates two new datasets - PortraitVideo and FullVideo, which contain cropped face videos and complete low - quality videos respectively, for evaluating the performance of the model in different scenarios. Through these innovations, ReBotNet can significantly reduce the computational resource requirements and accelerate the inference speed while maintaining high - quality video enhancement effects, thus being suitable for real - world application scenarios such as real - time video calls and streaming media.