Deformable Convolutions and LSTM-based Flexible Event Frame Fusion Network for Motion Deblurring

Dan Yang,Mehmet Yamac
2023-06-01
Abstract:Event cameras differ from conventional RGB cameras in that they produce asynchronous data sequences. While RGB cameras capture every frame at a fixed rate, event cameras only capture changes in the scene, resulting in sparse and asynchronous data output. Despite the fact that event data carries useful information that can be utilized in motion deblurring of RGB cameras, integrating event and image information remains a challenge. Recent state-of-the-art CNN-based deblurring solutions produce multiple 2-D event frames based on the accumulation of event data over a time period. In most of these techniques, however, the number of event frames is fixed and predefined, which reduces temporal resolution drastically, particularly for scenarios when fast-moving objects are present or when longer exposure times are required. It is also important to note that recent modern cameras (e.g., cameras in mobile phones) dynamically set the exposure time of the image, which presents an additional problem for networks developed for a fixed number of event frames. A Long Short-Term Memory (LSTM)-based event feature extraction module has been developed for addressing these challenges, which enables us to use a dynamically varying number of event frames. Using these modules, we constructed a state-of-the-art deblurring network, Deformable Convolutions and LSTM-based Flexible Event Frame Fusion Network (DLEFNet). It is particularly useful for scenarios in which exposure times vary depending on factors such as lighting conditions or the presence of fast-moving objects in the scene. It has been demonstrated through evaluation results that the proposed method can outperform the existing state-of-the-art networks for deblurring task in synthetic and real-world data sets.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the problem of image deblurring in dynamic scenes, particularly by utilizing data from event cameras to enhance deblurring performance. Specifically, the paper proposes solutions to the following challenges: 1. **Fusion of Event Data and RGB Image Data**: The data sequence generated by event cameras is asynchronous and only captures changes in the scene, resulting in sparse and asynchronous data output. Although event data is very useful for motion deblurring in RGB cameras, effectively combining event data with image information remains a challenge. 2. **Fixed Number of Event Frames Limitation**: Existing convolutional neural network (CNN)-based deblurring solutions typically assume a fixed exposure time and create multiple 2D event frames based on this assumption. However, in these techniques, the number of event frames is usually fixed, which significantly reduces temporal resolution, especially when dealing with fast-moving objects or requiring longer exposure times. 3. **Dynamic Exposure Time of Modern Cameras**: Modern cameras (such as those in smartphones) can dynamically set exposure times, posing an additional challenge to networks that assume a fixed number of event frames. To address the above issues, the authors propose a new network architecture—Deformable Convolution and LSTM-based Flexible Event Frame Fusion Network (DLEFNet). This network uses Long Short-Term Memory (LSTM) units for feature extraction and deformable convolutional neural networks (CNNs) to handle dynamically varying numbers of event frames. Additionally, DLEFNet incorporates encoded features of RGB frames at multiple scales to build an advanced deblurring network. This network is particularly suitable for scenarios involving fast-moving objects and varying exposure times. Experimental results show that the proposed DLEFNet method outperforms existing state-of-the-art networks on both the synthetic dataset GoPro and the real-world dataset REBlur.