Temporal Event Stereo via Joint Learning with Stereoscopic Flow

Hoonhee Cho,Jae-Young Kang,Kuk-Jin Yoon
2024-07-15
Abstract:Event cameras are dynamic vision sensors inspired by the biological retina, characterized by their high dynamic range, high temporal resolution, and low power consumption. These features make them capable of perceiving 3D environments even in extreme conditions. Event data is continuous across the time dimension, which allows a detailed description of each pixel's movements. To fully utilize the temporally dense and continuous nature of event cameras, we propose a novel temporal event stereo, a framework that continuously uses information from previous time steps. This is accomplished through the simultaneous training of an event stereo matching network alongside stereoscopic flow, a new concept that captures all pixel movements from stereo cameras. Since obtaining ground truth for optical flow during training is challenging, we propose a method that uses only disparity maps to train the stereoscopic flow. The performance of event-based stereo matching is enhanced by temporally aggregating information using the flows. We have achieved state-of-the-art performance on the MVSEC and the DSEC datasets. The method is computationally efficient, as it stacks previous information in a cascading manner. The code is available at <a class="link-external link-https" href="https://github.com/mickeykang16/TemporalEventStereo" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to make full use of the time - dense and continuous characteristics of the event camera to improve the performance of stereo matching, especially under extreme conditions (such as fast motion, high - dynamic - range scenes). Specifically, the paper proposes a new method - **temporal event stereo** to achieve this goal by jointly learning stereoscopic flow. ### Specific description of the problem 1. **Advantages and challenges of event cameras**: - Event cameras have the characteristics of high dynamic range, high temporal resolution, and low power consumption, enabling them to perceive the 3D environment under extreme conditions. - However, event data is sparse in space but dense in time. Therefore, how to effectively use this time information for stereo matching is a challenge. 2. **Limitations of existing methods**: - Although existing methods attempt to improve the performance of stereo matching through temporal aggregation, they usually only transfer information at the low - level feature level, ignoring the use of high - level representations (such as cost volume). - In addition, existing methods often require independent networks to process different types of information, resulting in high computational costs. ### Proposed solutions To overcome the above problems, the paper proposes the following innovations: 1. **Stereoscopic Flow**: - A new concept - stereoscopic flow is defined, which can capture the motion of all pixels between stereo camera pairs. - Stereoscopic flow includes not only the horizontal displacement of each camera but also a shared vertical displacement, thus better meeting the needs of stereo matching. 2. **Feature Warping and Cost Volume Warping**: - Warp past features and cost volumes to the current moment through stereoscopic flow and fuse them with current information, thereby enriching intermediate representations (such as feature maps and cost volumes) and improving the performance of stereo matching. - This method can make full use of past information without significantly increasing computational costs. 3. **Temporal Disparity Consistency Loss**: - A new loss function - temporal disparity consistency loss is introduced to train stereoscopic flow, ensuring stable training of the model even without the ground - truth labels of optical flow. ### Experimental results The paper conducted experiments on the MVSEC and DSEC datasets, and the results show that the proposed method achieves state - of - the - art performance on multiple metrics and is more computationally efficient. In summary, this paper aims to significantly improve the performance of stereo matching by introducing new methods of stereoscopic flow and temporal aggregation and making full use of the time characteristics of event cameras.