Multi-Stage Fusion for Event-based Multimodal Tracker

Xu Jia,Dong Wang,Huchuan Lu,Shengming Li,Xinyu Zhang,Wenyue Chen,Hefei Huang
DOI: https://doi.org/10.1109/ICME57554.2024.10687995
2024-07-15
Abstract:Event cameras are bio-inspired sensors with high dynamic range and time resolution, which are favorable properties for visual object tracking. There are already some methods that fuse the event modality and RGB modality with cross-domain feature integrator to achieve improved tracking performance. Researchers have developed some architectures for event modality processing or fusion, successfully boosting the tracking performance. In this work, we design a RGB-E tracker with multi-stage fusion. In the early stage, frames are enhanced with aid of events to mitigate blur or under/over-exposure degradation. During the middle stage, we utilize a fusion module for feature-level integration. At the late stage, we carry out decision-level fusion by predicting tracking boxes based on frame features, event features, and fused features, and the one with highest score is taken as the final estimation. Our design thoroughly integrate information from various levels, allowing each modality to contribute to the tracking process as much as possible. Extensive experiments demonstrate that the proposed method performs favorably against state-of-the-art RGB-E trackers in both accuracy and efficiency.
Engineering,Computer Science
What problem does this paper attempt to address?