Twofold Structured Features-Based Siamese Network for Infrared Target Tracking

Wei-Jie Yan,Yun-Kai Xu,Qian Chen,Xiao-Fang Kong,Guo-Hua Gu,A-Jun Shao,Min-Jie Wan
2024-06-27
Abstract:Nowadays, infrared target tracking has been a critical technology in the field of computer vision and has many applications, such as motion analysis, pedestrian surveillance, intelligent detection, and so forth. Unfortunately, due to the lack of color, texture and other detailed information, tracking drift often occurs when the tracker encounters infrared targets that vary in size or shape. To address this issue, we present a twofold structured features-based Siamese network for infrared target tracking. First of all, in order to improve the discriminative capacity for infrared targets, a novel feature fusion network is proposed to fuse both shallow spatial information and deep semantic information into the extracted features in a comprehensive manner. Then, a multi-template update module based on template update mechanism is designed to effectively deal with interferences from target appearance changes which are prone to cause early tracking failures. Finally, both qualitative and quantitative experiments are carried out on VOT-TIR 2016 dataset, which demonstrates that our method achieves the balance of promising tracking performance and real-time tracking speed against other out-of-the-art trackers.
Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges encountered in target tracking in infrared images. Specifically, due to the lack of detailed information such as texture and color in infrared images and being easily affected by shape or size changes, drift (i.e., tracking failure) is likely to occur during the tracking process. To address this problem, the authors propose a Siamese network based on dual - structure features (TSF - SiamMU) to improve the accuracy and robustness of infrared target tracking. ### Main contributions of the paper 1. **New feature fusion network**: - A novel feature fusion network is proposed, which fuses shallow - level spatial information and deep - level semantic information into the extracted features respectively, thereby enhancing the network's ability to distinguish blurry infrared targets. - In this way, the feature representation ability for infrared targets is improved, enabling the tracker to identify targets more accurately in complex scenes. 2. **Multi - template update module**: - A multi - template update module is designed. Based on the template update mechanism, the current optimal template is estimated by aggregating the initial template, the cumulative template, and the current template. - This module can effectively deal with the interference caused by the appearance change of infrared targets, thereby reducing tracking drift. 3. **Experimental verification**: - Qualitative and quantitative experiments are carried out on real infrared sequences. The results show that TSF - SiamMU is not only superior to other state - of - the - art methods in terms of precision and success rate but also can achieve real - time performance, with an average running speed of 47 FPS. ### Specific methods for solving problems - **Feature fusion network**: - ResNet - 50 is used as the basic feature extraction network, and a dual - structure feature network is designed to combine shallow - level and deep - level features. - Shallow - level features mainly contain spatial information, while deep - level features focus on semantic information. Through this combination, the network can capture rich information of targets at different levels. - **Multi - template update module**: - A multi - template update mechanism is introduced. By simplifying the residual network, different templates are reasonably combined during the tracking process. - The initial template provides the most reliable information. The current template is extracted based on the target position predicted in the previous frame, and the final template is a combination of the initial template and two output templates extracted by the convolutional network. ### Summary By introducing the dual - structure feature fusion network and the multi - template update module, this paper effectively solves the key problems in infrared target tracking, especially performing well in dealing with complex scenes and target appearance changes. The experimental results prove the advantages of this method in terms of precision and real - time performance.