EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching

Pengjie Zhang,Lin Zhu,Xiao Wang,Lizhi Wang,Wanxuan Lu,Hua Huang
2024-11-22
Abstract:Event cameras have shown promise in vision applications like optical flow estimation and stereo matching, with many specialized architectures leveraging the asynchronous and sparse nature of event data. However, existing works only focus event data within the confines of task-specific domains, overlooking how tasks across the temporal and spatial domains can reinforce each other. In this paper, we reformulate event-based flow estimation and stereo matching as a unified dense correspondence matching problem, enabling us to solve both tasks within a single model by directly matching features in a shared representation space. Specifically, our method utilizes a Temporal Recurrent Network to aggregate event features across temporal or spatial domains, and a Spatial Contextual Attention to enhance knowledge transfer across event flows via temporal or spatial interactions. By utilizing a shared feature similarities module that integrates knowledge from event streams via temporal or spatial interactions, our network performs optical flow estimation from temporal event segment inputs and stereo matching from spatial event segment inputs simultaneously. We demonstrate that our unified model inherently supports multi-task fusion and cross-task transfer. Without the need for retraining for specific task, our model can effectively handle both optical flow and stereo estimation, achieving state-of-the-art performance on both tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to uniformly handle optical flow estimation and stereo matching tasks in event camera data. Most of the existing research focuses on a single task and ignores the mutual reinforcement between tasks in the time domain and the spatial domain. This paper proposes a new framework, EMatch, which unifies optical flow estimation and stereo matching in a shared representation space through the dense correspondence matching problem, so that these two tasks can be processed simultaneously in a single model without the need for retraining for specific tasks. This method can not only improve the multi - task fusion ability of the model, but also realize cross - task knowledge transfer, thus achieving state - of - the - art performance in both tasks. Specifically, the main contributions of the paper are as follows: 1. **Proposing EMatch**: a new event - based framework that unifies optical flow estimation and stereo matching in a shared representation space through dense correspondence matching. This framework bridges the gap between time - awareness and space - awareness, enabling the model to handle motion and stereo estimation simultaneously. 2. **Introducing two key modules**: the Temporal Recurrent Network (TRN) and the Spatial Contextual Attention (SCA). TRN and SCA generate a unified feature map through feature aggregation in the time domain and the spatial domain for dense correspondence matching. 3. **Supporting multi - task fusion and cross - task transfer**: The EMatch model realizes multi - task fusion and cross - task transfer in a single unified architecture and achieves state - of - the - art performance in optical flow estimation and stereo matching tasks. The paper verifies the performance of EMatch in the DSEC benchmark test through experiments, demonstrating its advantages in multi - task fusion and cross - task transfer.