Feature enhancement and coarse-to-fine detection for RGB-D tracking

Xue-Feng Zhu,Tianyang Xu,Xiao-Jun Wu,Josef Kittler
DOI: https://doi.org/10.1016/j.patrec.2024.02.007
IF: 4.757
2024-02-12
Pattern Recognition Letters
Abstract:Existing RGB-D tracking algorithms advance the performance by constructing typical appearance models from the RGB-only tracking frameworks. There is no attempt to exploit any complementary visual information from the multi-modal input. This paper addresses this deficit and presents a novel algorithm to boost the performance of RGB-D tracking by taking advantage of collaborative clues. To guarantee input consistency, depth images are encoded into the three-channel HHA representation to create input of a similar structure to the RGB images, so that the deep CNN features can be extracted from both modalities. To highlight the discriminatory information in multi-modal features, a feature enhancement module using a cross-attention strategy is proposed. With the attention map produced by the proposed cross-attention method, the target area of the features can be enhanced and the negative influence of the background is suppressed. Besides, we address the potential tracking failure by introducing a long-term mechanism. The experimental results obtained on the well-known benchmarking datasets, including PTB, STC, and CTDB, demonstrate the superiority of the proposed RGB-D tracker. On PTB, the proposed method achieves the highest AUC scores against compared trackers across scenarios with five distinct challenging attributes. On STC and CDTB, our FECD obtains an overall AUC of 0.630 and an F-score of 0.630, respectively.
computer science, artificial intelligence
What problem does this paper attempt to address?