FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events

Xiangyuan Wang,Kuangyi Chen,Wen Yang,Lei Yu,Yannan Xing,Huai Yu
2024-03-18
Abstract:Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This paper advocates fusing the complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, we propose a novel keypoint detection network that fuses the textural and structural information from image frames with the high-temporal-resolution motion information from event streams, namely FE-DeTr. The network leverages a temporal response consistency for supervision, ensuring stable and efficient keypoint detection. Moreover, we use a spatio-temporal nearest-neighbor search strategy for robust keypoint tracking. Extensive experiments are conducted on a new dataset featuring both image frames and event data captured under extreme conditions. The experimental results confirm the superior performance of our method over both existing frame-based and event-based methods.
Robotics
What problem does this paper attempt to address?
This paper mainly discusses the problem of keypoint detection and tracking in low-quality image frames. Traditional frame-based methods are often affected by motion blur and extreme lighting conditions. Event cameras, with their high temporal resolution and high dynamic range, offer potential solutions to these problems. However, the inherent noise in event data limits its performance in practical applications. The paper proposes a new approach called FE-DeTr, which combines complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, this approach includes a fused feature extractor (FFE), a motion extractor (ME), and a motion-aware head (MAH). FFE combines texture and structural information from image frames with high temporal resolution motion information from event streams. ME utilizes spatial and channel attention mechanisms to extract motion information from event streams, while MAH handles relative motion through an iterative strategy and improves response repeatability using deformable convolutions. Additionally, the paper adopts a spatio-temporal nearest neighbor search strategy to achieve robust keypoint tracking. Experiments are conducted on a new dataset containing image frames and event data, demonstrating that this approach outperforms existing frame-based and event-based methods under extreme conditions. The main contributions include: proposing a framework that combines image frames and events for keypoint detection and tracking under extreme conditions for the first time; designing a motion-aware head based on iterative strategy and introducing a supervised strategy based on temporal response consistency for long-term keypoint tracking; creating a new dataset for keypoint detection and tracking, including high-speed motion and extreme lighting scenes. In conclusion, this paper addresses the key problem of improving the robustness of keypoint detection and tracking in low-quality images and challenging environments by fusing data from event cameras and traditional cameras.