Abstract:Current optical flow and point-tracking methods rely heavily on synthetic datasets. Event cameras are novel vision sensors with advantages in challenging visual conditions, but state-of-the-art frame-based methods cannot be easily adapted to event data due to the limitations of current event simulators. We introduce a novel self-supervised loss combining the Contrast Maximization framework with a non-linear motion prior in the form of pixel-level trajectories and propose an efficient solution to solve the high-dimensional assignment problem between non-linear trajectories and events. Their effectiveness is demonstrated in two scenarios: In dense continuous-time motion estimation, our method improves the zero-shot performance of a synthetically trained model on the real-world dataset EVIMO2 by 29%. In optical flow estimation, our method elevates a simple UNet to achieve state-of-the-art performance among self-supervised methods on the DSEC optical flow benchmark. Our code is available at <a class="link-external link-https" href="https://github.com/tub-rip/MotionPriorCMax" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the **long - time and dense event - camera motion - estimation problem**, especially reducing the domain - adaptation gap (i.e., the gap between simulated data and real data) existing in previous methods. Specifically, the paper focuses on how to use the data of event cameras for dense continuous - time motion estimation in complex and nonlinear motion scenarios. This involves overcoming the problem of the lack of large - scale labeled datasets, while dealing with event - noise and data - association challenges, and making full use of the spatio - temporal characteristics of event data. ### Main Contributions 1. **Introducing Motion Priors**: The author proposes a method combined with a contrast - maximization framework, using parameterized motion priors (such as polynomials, Bézier curves, etc.) to balance the generality and regularization of motion, thereby achieving long - time and dense motion estimation. 2. **Improving Zero - Shot Performance**: By combining self - supervised loss with existing top - supervised models (such as Bflow), the paper shows how to improve the performance of pre - trained models on unseen real data. Specifically, experiments on the EVIMO2 dataset show that fine - tuning with self - supervised loss can improve zero - shot performance by 29%. 3. **Achieving State - of - the - Art Self - supervised Performance**: The method proposed in the paper achieves state - of - the - art self - supervised performance in the DSEC optical - flow benchmark, with an average improvement of 19% in angular error and 14% in inlier percentage, while the inference speed is increased by 5 times. ### Method Overview The method proposed in the paper is divided into two stages: 1. **Supervised - Learning Stage**: First, use synthetic data for supervised learning to provide initial model weights. 2. **Self - supervised - Learning Stage**: Then, fine - tune on real data through self - supervised loss to reduce the domain - adaptation gap. To deal with the high - dimensional assignment problem (i.e., associating events with trajectories), the author proposes two techniques: - **Interpolation Method**: Interpolate through a rough spatio - temporal displacement field as a lookup table. - **K - Nearest - Neighbor Search**: Use the sign - matrix framework to calculate the K - nearest - neighbor trajectories of each grid point, thereby efficiently solving the problem on the GPU. ### Experimental Results The paper conducts experimental verification on two main applications: 1. **Dense Continuous - Time Motion Estimation**: On the EVIMO2 dataset, after fine - tuning with self - supervised loss, the performance of the model is significantly improved. 2. **Optical - Flow Estimation**: On the DSEC dataset, the method proposed in the paper achieves state - of - the - art performance among self - supervised methods. In conclusion, this paper addresses key challenges in long - time and dense event - camera motion estimation by introducing motion priors and self - supervised - learning strategies, providing new ideas and technical means for research in this field.

Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation

Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity.

EV-MGRFlowNet: Motion-Guided Recurrent Network for Unsupervised Event-Based Optical Flow With Hybrid Motion-Compensation Loss

Dense Continuous-Time Optical Flow from Event Cameras

Towards Anytime Optical Flow Estimation with Event Cameras

Dense Continuous-Time Optical Flow from Events and Frames

Unsupervised Learning Optical Flow in Multi-frame Dynamic Environment Using Temporal Dynamic Modeling

EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras

Event-based Optical Flow Via Transforming into Motion-dependent View

BlinkFlow: A Dataset to Push the Limits of Event-based Optical Flow Estimation

ResFlow: Fine-tuning Residual Optical Flow for Event-based High Temporal Resolution Motion Estimation

ScaleFlow++: Robust and Accurate Estimation of 3D Motion from Video

TMA: Temporal Motion Aggregation for Event-based Optical Flow

Dual-frame Fluid Motion Estimation with Test-time Optimization and Zero-divergence Loss

Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation

Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors

Neuromorphic Optical Flow and Real-time Implementation with Event Cameras

Optical Flow Estimation Via Motion Feature Recovery

ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild.