LiDAR-Event Stereo Fusion with Hallucinations

Luca Bartolomei,Matteo Poggi,Andrea Conti,Stefano Mattoccia
2024-08-09
Abstract:Event stereo matching is an emerging technique to estimate depth from neuromorphic cameras; however, events are unlikely to trigger in the absence of motion or the presence of large, untextured regions, making the correspondence problem extremely challenging. Purposely, we propose integrating a stereo event camera with a fixed-frequency active sensor -- e.g., a LiDAR -- collecting sparse depth measurements, overcoming the aforementioned limitations. Such depth hints are used by hallucinating -- i.e., inserting fictitious events -- the stacks or raw input streams, compensating for the lack of information in the absence of brightness changes. Our techniques are general, can be adapted to any structured representation to stack events and outperform state-of-the-art fusion methods applied to event-based stereo.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the challenge faced by event cameras in estimating depth in the absence of motion or brightness changes. Specifically, since event cameras only trigger events when detecting brightness changes, these event streams become sparse and lack sufficient information in the absence of motion or in large textureless areas, making it difficult for stereo matching algorithms to match events between the left and right cameras. To solve this problem, the paper proposes a method that fuses event cameras with fixed-frequency active sensors such as LiDAR. The authors designed two strategies to enhance the information density of the event stream: 1. **Virtual Stack Hallucination (VSH)**: By inserting fictitious event patterns into the event stack, the uniqueness of local patterns is increased, thereby simplifying the matching process. 2. **Back-in-Time Hallucination (BTH)**: By directly generating fictitious events in the continuous event domain, the uniqueness of correspondences is improved. These two methods can increase the uniqueness of the event stream and simplify the matching process without sacrificing the microsecond-level resolution of the event camera, utilizing asynchronous LiDAR data. Experimental results show that these two strategies outperform existing fusion methods on both the DSEC and M3ED datasets.