Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation

Friedhelm Hamann,Ziyun Wang,Ioannis Asmanis,Kenneth Chaney,Guillermo Gallego,Kostas Daniilidis
2024-07-15
Abstract:Current optical flow and point-tracking methods rely heavily on synthetic datasets. Event cameras are novel vision sensors with advantages in challenging visual conditions, but state-of-the-art frame-based methods cannot be easily adapted to event data due to the limitations of current event simulators. We introduce a novel self-supervised loss combining the Contrast Maximization framework with a non-linear motion prior in the form of pixel-level trajectories and propose an efficient solution to solve the high-dimensional assignment problem between non-linear trajectories and events. Their effectiveness is demonstrated in two scenarios: In dense continuous-time motion estimation, our method improves the zero-shot performance of a synthetically trained model on the real-world dataset EVIMO2 by 29%. In optical flow estimation, our method elevates a simple UNet to achieve state-of-the-art performance among self-supervised methods on the DSEC optical flow benchmark. Our code is available at <a class="link-external link-https" href="https://github.com/tub-rip/MotionPriorCMax" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning,Robotics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the **long - time and dense event - camera motion - estimation problem**, especially reducing the domain - adaptation gap (i.e., the gap between simulated data and real data) existing in previous methods. Specifically, the paper focuses on how to use the data of event cameras for dense continuous - time motion estimation in complex and nonlinear motion scenarios. This involves overcoming the problem of the lack of large - scale labeled datasets, while dealing with event - noise and data - association challenges, and making full use of the spatio - temporal characteristics of event data. ### Main Contributions 1. **Introducing Motion Priors**: The author proposes a method combined with a contrast - maximization framework, using parameterized motion priors (such as polynomials, Bézier curves, etc.) to balance the generality and regularization of motion, thereby achieving long - time and dense motion estimation. 2. **Improving Zero - Shot Performance**: By combining self - supervised loss with existing top - supervised models (such as Bflow), the paper shows how to improve the performance of pre - trained models on unseen real data. Specifically, experiments on the EVIMO2 dataset show that fine - tuning with self - supervised loss can improve zero - shot performance by 29%. 3. **Achieving State - of - the - Art Self - supervised Performance**: The method proposed in the paper achieves state - of - the - art self - supervised performance in the DSEC optical - flow benchmark, with an average improvement of 19% in angular error and 14% in inlier percentage, while the inference speed is increased by 5 times. ### Method Overview The method proposed in the paper is divided into two stages: 1. **Supervised - Learning Stage**: First, use synthetic data for supervised learning to provide initial model weights. 2. **Self - supervised - Learning Stage**: Then, fine - tune on real data through self - supervised loss to reduce the domain - adaptation gap. To deal with the high - dimensional assignment problem (i.e., associating events with trajectories), the author proposes two techniques: - **Interpolation Method**: Interpolate through a rough spatio - temporal displacement field as a lookup table. - **K - Nearest - Neighbor Search**: Use the sign - matrix framework to calculate the K - nearest - neighbor trajectories of each grid point, thereby efficiently solving the problem on the GPU. ### Experimental Results The paper conducts experimental verification on two main applications: 1. **Dense Continuous - Time Motion Estimation**: On the EVIMO2 dataset, after fine - tuning with self - supervised loss, the performance of the model is significantly improved. 2. **Optical - Flow Estimation**: On the DSEC dataset, the method proposed in the paper achieves state - of - the - art performance among self - supervised methods. In conclusion, this paper addresses key challenges in long - time and dense event - camera motion estimation by introducing motion priors and self - supervised - learning strategies, providing new ideas and technical means for research in this field.