Cross-modal Learning for Optical Flow Estimation with Events

Chi Zhang,Chenxu Jiang,Lei Yu
DOI: https://doi.org/10.1016/j.sigpro.2024.109580
IF: 4.729
2024-01-01
Signal Processing
Abstract:Benefiting from the low latency and high dynamic range, event cameras have recently been adopted for Optical Flow (OF) prediction under harsh environments with high-speed motion or extreme lighting conditions. However, the emitted events only respond to brightness changes, mostly at edges with high-intensity contrast, leading to the ill-posedness of OF prediction in non-edge or low-contrast regions. To address this problem, we propose a network to learn OF by exploring the complements between the frame and the event domains. In particular, we first extract cross-modal features from frames and events via a fusion-based encoder and then apply a self-attention module of cross-dimension interaction to suppress disturbances in noisy events. Furthermore, a long-term memory recurrent module is built for local and global temporal information combination and cross-modal feature enhancement. Finally, a CNN decoder is designed to upscale cross-modal features and provide pyramidal results for multi-scale loss computation. Extensive experiments demonstrate that the proposed method can achieve leading performance compared with the state-of-the-art OF models.
What problem does this paper attempt to address?