A Novel Bounding Box Regression Method for Single Object Tracking

Omar Abdelaziz,Mohamed Sami Shehata
2024-05-17
Abstract:Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as well. While previous methods relied heavily on well-learned features representing interactions between search and template, we hypothesize that the receptive field of the input convolutional bounding box network plays an important role in accurately determining the object location. To this end, we introduce two novel bounding box regression networks: inception and deformable. Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks: the GOT-10k, the UAV123 and the OTB2015.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily focuses on addressing the issues of bounding box regression methods in Single Object Tracking (SOT) tasks. Specifically, the authors point out that although current SOT methods have made significant progress in visual encoding and relationship modeling, they have relatively neglected the importance of network design in the bounding box regression stage. The paper proposes a new bounding box regression method aimed at improving the process of bounding box regression from joint search and template features. The main contributions of the paper include: 1. **Introduction of two new bounding box regression networks**: namely the Inception module and the Inception module with deformable convolutions. Both networks are capable of learning feature representations with different receptive fields, thereby improving the accuracy of bounding box prediction. 2. **Experimental validation of the new method's effectiveness**: Through extensive experiments on three benchmark datasets (GOT-10k, UAV123, and OTB2015), it is demonstrated that the proposed bounding box regression networks can significantly enhance the performance of Vision Transformer-based SOT models. In short, this research aims to enhance the performance of Vision Transformer-based SOT systems by improving the design of bounding box regression networks, particularly in terms of position estimation accuracy when dealing with complex visual scenes.