BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events

Yijin Li,Yichen Shen,Zhaoyang Huang,Shuo Chen,Weikang Bian,Xiaoyu Shi,Fu-Yun Wang,Keqiang Sun,Hujun Bao,Zhaopeng Cui,Guofeng Zhang,Hongsheng Li
2024-10-27
Abstract:Recent advances in event-based vision suggest that these systems complement traditional cameras by providing continuous observation without frame rate limitations and a high dynamic range, making them well-suited for correspondence tasks such as optical flow and point tracking. However, there is still a lack of comprehensive benchmarks for correspondence tasks that include both event data and images. To address this gap, we propose BlinkVision, a large-scale and diverse benchmark with multiple modalities and dense correspondence annotations. BlinkVision offers several valuable features: 1) Rich modalities: It includes both event data and RGB images. 2) Extensive annotations: It provides dense per-pixel annotations covering optical flow, scene flow, and point tracking. 3) Large vocabulary: It contains 410 everyday categories, sharing common classes with popular 2D and 3D datasets like LVIS and ShapeNet. 4) Naturalistic: It delivers photorealistic data and covers various naturalistic factors, such as camera shake and deformation. BlinkVision enables extensive benchmarks on three types of correspondence tasks (optical flow, point tracking, and scene flow estimation) for both image-based and event-based methods, offering new observations, practices, and insights for future research. The benchmark website is <a class="link-external link-https" href="https://www.blinkvision.net/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of how to combine data from traditional cameras (RGB images) and event cameras to improve the performance of tasks such as optical flow, scene flow, and point tracking estimation. Specifically, the paper points out: 1. **Limitations of existing benchmark datasets**: Most current benchmark datasets are either biased towards specific scenes or overly simplistic, and lack simultaneous support for event data and RGB images. This limits the development of algorithms in fully utilizing event data and fusing information from both modalities. 2. **Challenges in pixel correspondence tasks**: Traditional image-based methods perform poorly when dealing with large frame intervals, motion blur under extreme lighting conditions, and limited dynamic range. Event cameras, due to their high dynamic range and lack of frame rate limitations, have potential advantages in these situations, but existing event camera methods have not yet fully realized this potential. To address the above challenges, the paper proposes BlinkVision, a large-scale, diverse synthetic benchmark dataset designed to promote research in optical flow, scene flow, and point tracking estimation based on RGB images and event data. BlinkVision has the following features: - **Rich modalities**: Includes final RGB images, clean RGB images, and event data. - **Rich annotations**: Provides dense pixel-level annotations covering optical flow, scene flow, and point tracking. - **Wide range of categories**: Contains 410 everyday categories, sharing common classes with widely used 2D and 3D datasets such as LVIS and ShapeNet. - **Naturalism**: Provides realistic data covering various natural factors, such as camera shake and deformation. Through BlinkVision, researchers can more comprehensively evaluate and improve algorithms based on image and event data, thereby advancing the development of computer vision systems.