Optimized Information Flow for Transformer Tracking

Janani Kugarajeevan,Thanikasalam Kokul,Amirthalingam Ramanan,Subha Fernando
2024-02-13
Abstract:One-stream Transformer trackers have shown outstanding performance in challenging benchmark datasets over the last three years, as they enable interaction between the target template and search region tokens to extract target-oriented features with mutual guidance. Previous approaches allow free bidirectional information flow between template and search tokens without investigating their influence on the tracker's discriminative capability. In this study, we conducted a detailed study on the information flow of the tokens and based on the findings, we propose a novel Optimized Information Flow Tracking (OIFTrack) framework to enhance the discriminative capability of the tracker. The proposed OIFTrack blocks the interaction from all search tokens to target template tokens in early encoder layers, as the large number of non-target tokens in the search region diminishes the importance of target-specific features. In the deeper encoder layers of the proposed tracker, search tokens are partitioned into target search tokens and non-target search tokens, allowing bidirectional flow from target search tokens to template tokens to capture the appearance changes of the target. In addition, since the proposed tracker incorporates dynamic background cues, distractor objects are successfully avoided by capturing the surrounding information of the target. The OIFTrack demonstrated outstanding performance in challenging benchmarks, particularly excelling in the one-shot tracking benchmark GOT-10k, achieving an average overlap of 74.6\%. The code, models, and results of this work are available at \url{https://github.com/JananiKugaa/OIFTrack}
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the technical challenges in Visual Object Tracking (VOT), particularly the issue of improving tracking performance using the Transformer architecture. Specifically, the research focuses on the following points: 1. **Optimizing Information Flow Mechanism**: The study finds that in existing one-stream Transformer trackers, the free flow of token information between the template (target area) and the search area may affect the tracker's discriminative ability. Therefore, the paper proposes a new framework called **Optimized Information Flow Tracking (OIFTrack)**, which aims to improve the tracker's discriminative ability by optimizing the information flow between the template and search area tokens. - **Information Flow Control**: In OIFTrack, the early encoding layers block all search tokens from flowing information to the target template tokens to reduce the influence of non-target-related features. In the deeper encoding layers, the search tokens are divided into target search tokens and non-target search tokens, allowing the former to exchange information bidirectionally with the template tokens, thereby capturing changes in the target's appearance. - **Dynamic Background Cues**: This method also introduces dynamic background cues, which help avoid distractors and accurately capture the environmental information around the target. 2. **Application of Dynamic Templates**: To address the challenges brought by changes in the target's appearance, some existing methods use dynamic templates, which extract high-confidence target areas from intermediate frames. However, these methods usually simply concatenate the dynamic templates with the initial template and search area tokens without considering the impact of information flow between different types of tokens. OIFTrack further enhances tracking performance by reasonably controlling these information flows. 3. **Experimental Validation**: The paper validates the effectiveness of OIFTrack through a series of experiments, achieving significant results on challenging benchmark datasets such as GOT-10k, with an average overlap rate of 74.6%. In summary, the main contribution of this paper is the proposal of a new one-stream Transformer tracking framework, OIFTrack, which effectively enhances the tracker's discriminative ability and tracking accuracy through fine control of information flow. Additionally, by utilizing dynamic background cues, OIFTrack can better adapt to changes in the target's appearance and avoid distractors.