Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging Conditions

Yi-Fan Zuo,Wanting Xu,Xia Wang,Yifu Wang,Laurent Kneip
2024-01-16
Abstract:Vision-based localization is a cost-effective and thus attractive solution for many intelligent mobile platforms. However, its accuracy and especially robustness still suffer from low illumination conditions, illumination changes, and aggressive motion. Event-based cameras are bio-inspired visual sensors that perform well in HDR conditions and have high temporal resolution, and thus provide an interesting alternative in such challenging scenarios. While purely event-based solutions currently do not yet produce satisfying mapping results, the present work demonstrates the feasibility of purely event-based tracking if an alternative sensor is permitted for mapping. The method relies on geometric 3D-2D registration of semi-dense maps and events, and achieves highly reliable and accurate cross-modal tracking results. Practically relevant scenarios are given by depth camera-supported tracking or map-based localization with a semi-dense map prior created by a regular image-based visual SLAM or structure-from-motion system. Conventional edge-based 3D-2D alignment is extended by a novel polarity-aware registration that makes use of signed time-surface maps (STSM) obtained from event streams. We furthermore introduce a novel culling strategy for occluded points. Both modifications increase the speed of the tracker and its robustness against occlusions or large view-point variations. The approach is validated on many real datasets covering the above-mentioned challenging conditions, and compared against similar solutions realised with regular cameras.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve high - precision and robust visual localization and tracking based on event cameras under challenging conditions, such as low - light, illumination changes, and violent motion. Specifically, the authors propose a cross - modal semi - dense 6 - Degree - of - Freedom (6 - DoF) tracking method, which uses event cameras to perform efficient and accurate tracking under these challenging conditions. Traditional pure - visual solutions often lack robustness in such environments, while event cameras perform well in these scenarios due to their high - dynamic - range (HDR) and high - temporal - resolution. However, it is still difficult to construct a high - quality 3D map relying solely on the data of event cameras. Therefore, the method proposed in this paper combines the information provided by other sensors (such as depth cameras) to improve the accuracy and robustness of tracking. ### Specific problems solved by the paper: 1. **Tracking under low - light conditions**: Traditional visual sensors have poor performance under low - light conditions, while event cameras can provide a reliable event stream under low - light conditions because they are insensitive to illumination changes. 2. **Illumination changes**: Illumination changes can affect the image quality of traditional cameras, resulting in unstable feature extraction. Event cameras can capture the information of illumination changes, so they can maintain good tracking performance in environments with large illumination changes. 3. **Violent motion**: Traditional visual sensors are prone to motion blur during high - speed or violent motion, which affects the tracking effect. Event cameras have high - temporal - resolution and can capture rapidly changing scenes, so they are suitable for use during violent motion. 4. **Quality of 3D maps**: It is relatively difficult to construct a high - quality 3D map relying solely on the data of event cameras. In this paper, by combining the data provided by depth cameras or other sensors, the quality of 3D maps is improved, thereby improving the accuracy of tracking. ### Main contributions: 1. **Cross - modal tracking method**: A cross - modal tracking method based on event cameras and semi - dense 3D point - cloud priors is proposed. This method can achieve high - precision and robust 6 - DoF tracking under challenging conditions. 2. **Signed Temporal Surface Map (STSM)**: The signed temporal surface map is introduced, which divides the temporal surface map into three sub - maps: positive, neutral, and negative, and uses the polarity information of events to improve the convergence and accuracy of tracking. 3. **Predictive semi - dense point registration strategy**: A new predictive semi - dense point registration strategy is proposed, which improves the tracking speed and robustness by discarding occluded points. 4. **Open - source code**: The code of the framework is released, supporting multiple tracking methods, including methods based on event cameras and traditional cameras. ### Experimental verification: The paper conducts experimental verification on multiple real - world datasets, including indoor and outdoor environments, to verify the performance of the proposed method under various challenging conditions. The experimental results show that this method can achieve higher accuracy and robustness than traditional methods under conditions such as high - dynamic - range, low - light, and violent motion. In conclusion, this paper proposes an efficient cross - modal tracking method by combining the data of event cameras and other sensors, which solves the tracking problems of traditional visual sensors under challenging conditions.