A Novel Bounding Box Regression Method for Single Object Tracking

Omar Abdelaziz,Mohamed Sami Shehata

2024-05-17

Abstract:Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as well. While previous methods relied heavily on well-learned features representing interactions between search and template, we hypothesize that the receptive field of the input convolutional bounding box network plays an important role in accurately determining the object location. To this end, we introduce two novel bounding box regression networks: inception and deformable. Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks: the GOT-10k, the UAV123 and the OTB2015.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper primarily focuses on addressing the issues of bounding box regression methods in Single Object Tracking (SOT) tasks. Specifically, the authors point out that although current SOT methods have made significant progress in visual encoding and relationship modeling, they have relatively neglected the importance of network design in the bounding box regression stage. The paper proposes a new bounding box regression method aimed at improving the process of bounding box regression from joint search and template features. The main contributions of the paper include: 1. **Introduction of two new bounding box regression networks**: namely the Inception module and the Inception module with deformable convolutions. Both networks are capable of learning feature representations with different receptive fields, thereby improving the accuracy of bounding box prediction. 2. **Experimental validation of the new method's effectiveness**: Through extensive experiments on three benchmark datasets (GOT-10k, UAV123, and OTB2015), it is demonstrated that the proposed bounding box regression networks can significantly enhance the performance of Vision Transformer-based SOT models. In short, this research aims to enhance the performance of Vision Transformer-based SOT systems by improving the design of bounding box regression networks, particularly in terms of position estimation accuracy when dealing with complex visual scenes.

A Novel Bounding Box Regression Method for Single Object Tracking

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Tracking Randomly Moving Objects on Edge Box Proposals

Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

A Multi-Object Tracker Using Dynamic Bayesian Networks and a Residual Neural Network Based Similarity Estimator.

RTrack: Accelerating Convergence for Visual Object Tracking via Pseudo-Boxes Exploration

SiamBAN: Target-Aware Tracking With Siamese Box Adaptive Network

Joint Feature Correspondences and Appearance Similarity for Robust Visual Object Tracking

Learning to Track Object Position through Occlusion

3D-SiamRPN: An End-to-End Learning Method for Real-Time 3D Single Object Tracking Using Raw Point Cloud

CAT: Corner Aided Tracking With Deep Regression Network

BOTT: Box Only Transformer Tracker for 3D Object Tracking

Occlusion-Aware Visual Object Tracking Based on Multi-template Updating Siamese Network

Online Background Discriminative Learning for Satellite Video Object Tracking

Deep Reinforcement Learning With Iterative Shift For Visual Tracking

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

DEFT: Detection Embeddings for Tracking

Non-linear Target Trajectory Prediction for Robust Visual Tracking

Robust Tracking Using Region Proposal Networks.