Attention shake siamese network with auxiliary relocation branch for visual object tracking

Jun Wang,Weibin Liu,Weiwei Xing,Liqiang Wang,Shunli Zhang
DOI: https://doi.org/10.1016/j.neucom.2020.02.120
IF: 6
2020-08-01
Neurocomputing
Abstract:<p>Siamese network is highly regarded in the visual object tracking filed because of its unique advantages of pairwise input and pairwise training. It can measure the similarity between two image patches, which coincides with the principle of the matching-based tracking algorithm. In this paper, a variant Siamese network based tracker is proposed to introduce attention module into traditional Siamese network, and relocate the object with some auxiliary relocation methods, when the proposed tracker runs under an untrusted state. Firstly, a novel attention shake layer is proposed to replace the max pooling layer in Siamese network. This layer could introduce and train two different kinds of attention modules at the same time, which means the proposed attention shake layer could also help to improve the expression power of Siamese network without increasing the depth of the network. Secondly, an auxiliary relocation branch is proposed to assist in object relocation and tracking. According to the prior assumptions of visual object tracking, some weights are involved in the auxiliary relocation branch, such as structure similarity weight, motion similarity weight, motion smoothness weight and object saliency weight. Thirdly, a novel response map based switch function is proposed to monitor the tracking process and control the effect of auxiliary relocation branch. Furthermore, in order to discuss the effect of pooling layer in Siamese network, 9 pooling and attention architectures are proposed and discussed in this paper. Some empirical results are shown in the experiment part. Comparing with the state-of-the-art trackers, the proposed tracker could achieve comparable performance in multiple benchmarks.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are several key challenges in visual object tracking, specifically including: 1. **Enhancing Expressive Power**: How to enhance the expressive power of the Siamese network based on AlexNet without increasing the network depth or the number of layers. 2. **Introducing Prior Knowledge**: How to introduce prior knowledge in visual object tracking (such as structural similarity weight, motion similarity weight, motion smoothness weight, and object saliency weight) into the Siamese - network - based tracker to optimize and improve the tracking results. 3. **Monitoring the Tracking Process**: How to monitor the tracking process, detect tracking failures, and relocate the target when the tracker is in an untrusted state. To address these challenges, the author proposes the following solutions: - **Attention Shake Layer**: A new Attention Shake Layer is proposed to replace the max - pooling layer in the Siamese network. This layer combines two different attention modules and automatically trains these two modules through the shake - shake framework, thereby avoiding the over - fitting problem and enhancing the network's expressive power at the same time. - **Auxiliary Relocation Branch**: An auxiliary relocation branch is designed, which utilizes prior knowledge such as structural similarity weight, motion similarity weight, motion smoothness weight, and object saliency weight to help relocate the target when the tracker is in an untrusted state. - **Response Map Based Switch Function**: A switch function based on the response map is proposed for online monitoring of the tracking process. When the value of the switch function is below a certain threshold, the tracker is considered to be in an untrusted state, and at this time, the auxiliary relocation branch will intervene to optimize the tracking results. Through these methods, the paper aims to improve the performance and robustness of the Siamese network in visual object tracking tasks.