Abstract:Event cameras have shown promise in vision applications like optical flow estimation and stereo matching, with many specialized architectures leveraging the asynchronous and sparse nature of event data. However, existing works only focus event data within the confines of task-specific domains, overlooking how tasks across the temporal and spatial domains can reinforce each other. In this paper, we reformulate event-based flow estimation and stereo matching as a unified dense correspondence matching problem, enabling us to solve both tasks within a single model by directly matching features in a shared representation space. Specifically, our method utilizes a Temporal Recurrent Network to aggregate event features across temporal or spatial domains, and a Spatial Contextual Attention to enhance knowledge transfer across event flows via temporal or spatial interactions. By utilizing a shared feature similarities module that integrates knowledge from event streams via temporal or spatial interactions, our network performs optical flow estimation from temporal event segment inputs and stereo matching from spatial event segment inputs simultaneously. We demonstrate that our unified model inherently supports multi-task fusion and cross-task transfer. Without the need for retraining for specific task, our model can effectively handle both optical flow and stereo estimation, achieving state-of-the-art performance on both tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to uniformly handle optical flow estimation and stereo matching tasks in event camera data. Most of the existing research focuses on a single task and ignores the mutual reinforcement between tasks in the time domain and the spatial domain. This paper proposes a new framework, EMatch, which unifies optical flow estimation and stereo matching in a shared representation space through the dense correspondence matching problem, so that these two tasks can be processed simultaneously in a single model without the need for retraining for specific tasks. This method can not only improve the multi - task fusion ability of the model, but also realize cross - task knowledge transfer, thus achieving state - of - the - art performance in both tasks. Specifically, the main contributions of the paper are as follows: 1. **Proposing EMatch**: a new event - based framework that unifies optical flow estimation and stereo matching in a shared representation space through dense correspondence matching. This framework bridges the gap between time - awareness and space - awareness, enabling the model to handle motion and stereo estimation simultaneously. 2. **Introducing two key modules**: the Temporal Recurrent Network (TRN) and the Spatial Contextual Attention (SCA). TRN and SCA generate a unified feature map through feature aggregation in the time domain and the spatial domain for dense correspondence matching. 3. **Supporting multi - task fusion and cross - task transfer**: The EMatch model realizes multi - task fusion and cross - task transfer in a single unified architecture and achieves state - of - the - art performance in optical flow estimation and stereo matching tasks. The paper verifies the performance of EMatch in the DSEC benchmark test through experiments, demonstrating its advantages in multi - task fusion and cross - task transfer.

EMatch: A Unified Framework for Event-based Optical Flow and Stereo Matching

Intensity/Inertial Integration-Aided Feature Tracking on Event Cameras

Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity.

Spatially-guided Temporal Aggregation for Robust Event-RGB Optical Flow Estimation

Unifying Flow, Stereo and Depth Estimation.

Efficient Meshflow and Optical Flow Estimation from Event Cameras

Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence

Learning Local Event-based Descriptor for Patch-based Stereo Matching

Temporal Event Stereo via Joint Learning with Stereoscopic Flow

Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation

Cross-modal Learning for Optical Flow Estimation with Events

Feature-Level Collaboration: Joint Unsupervised Learning of Optical Flow, Stereo Depth and Camera Motion.

E-HANet: Event-based Hybrid Attention Network for Optical Flow Estimation.

E-RAFT: Dense Optical Flow from Event Cameras

Towards Anytime Optical Flow Estimation with Event Cameras

Improved Event-Based Dense Depth Estimation Via Optical Flow Compensation.

EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras

Event-based Optical Flow Via Transforming into Motion-dependent View

Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition

Edge-Guided Fusion and Motion Augmentation for Event-Image Stereo

EGOF-Net: Epipolar Guided Optical Flow Network for Unrectified Stereo Matching