Online Infrared UAV Target Tracking with Enhanced Context-Awareness and Pixel-Wise Attention Modulation

Houzhang Fang,Chenxing Wu,Xiaolin Wang,Fan Zhou,Yi Chang,Luxin Yan
DOI: https://doi.org/10.1109/tgrs.2024.3432108
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Unmanned aerial vehicles (UAVs) have been popular in many commercial and industrial applications, but they also pose great threats to urban safety and aerial security. Intelligent UAV surveillance based on thermal infrared (TIR) imaging has attracted increasing attention for its long-range monitoring ability in both day and night scenarios. However, weak UAV target features, dynamically changing UAV states, and complex background interferences present serious challenges to the accurate tracking of UAVs. To tackle these problems, we propose a novel online multiscale infrared UAV target (IRUT) tracking network (SiamCAP) incorporating enhanced context feature awareness and pixel-wise attention modulation. We first introduce a novel contrast-enhanced multiscale online re-parameterization block (CMORB) to effectively extract contrast difference intensity information between the target and the background, and transform it into a single branch for both training and inference without introducing computational overhead. Then, we construct a feature fusion modulation module (FFMM) to guide cross-layer feature aggregation. It uses low-level attention to highlight the UAV target feature in the deep layer with the novel full spatial resolution channel attention (FSRCA), which calculates pixel-wise importance without dimensionality compression. Finally, we propose a cross-attention-based updatable feature interaction module (CUFIM) to model the correlation between online updating multitemplate and search frame, which improves the model's robustness to changes in the state of UAVs and complex backgrounds. Extensive experiments on real infrared UAV datasets demonstrate that the proposed approach outperforms the state-of-the-art (SOTA) target trackers under complex backgrounds while achieving a real-time tracking speed.
What problem does this paper attempt to address?