Abstract:Advanced general visual object tracking models have been drastically developed with the access of large annotated datasets and progressive network architectures. However, a general tracker always suffers domain shift when directly adopting to specific testing scenarios. In this paper, we dedicate to addressing the animal tracking problem by proposing a spatio-temporal inference module and a coarse-to-fine tracking strategy. In terms of tracking animals, non-rigid deformation is a typical challenge. Therefore, we particularly design a novel transformer-based inference structure where the changing animal state is transmitted across continuous frames. By explicitly transmitting the appearance variations, this spatio-temporal module enables adaptive target learning, boosting the animal tracking performance compared to the fixed template matching approaches. Besides, considering the altered contours of animals in different frames, we propose to perform coarse-to-fine tracking to obtain a fine-grained animal bounding box with a dedicated distribution-aware regression module. The coarse tracking phase focuses on distinguishing the target against potential distractors in the background. While the fine-grained tracking phase aims at accurately regressing the final animal bounding box. To facilitate animal tracking evaluation, we captured and annotated 145 video sequences with 20 categories from the zoo, forming a new test set for animal tracking, coined ZOO145. We also collected a dataset, AnimalSOT, with 162 video sequences from existing tracking test benchmarks. The experimental performance on animal tracking datasets, MoCA, ZOO145, and AnimalSOT, demonstrate the merit of the proposed approach against advanced general tracking approaches, providing a baseline for future animal tracking studies.

Towards Highly Effective Moving Tiny Ball Tracking Via Vision Transformer

TrackNetV4: Enhancing Fast Sports Object Tracking with Motion Attention Maps

Tracking Small and Fast Moving Objects: A Benchmark

Computational Analysis of Table Tennis Matches from Real-Time Videos Using Deep Learning.

GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

Widely Applicable Strong Baseline for Sports Ball Detection and Tracking

Efficient Golf Ball Detection and Tracking Based on Convolutional Neural Networks and Kalman Filter

Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

GTA: Global Tracklet Association for Multi-Object Tracking in Sports

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline

A Badminton Recognition and Tracking System Based on Context Multi-feature Fusion

High-Speed Tiny Tennis Ball Detection Based on Deep Convolutional Neural Networks

AViTMP: A Tracking-Specific Transformer for Single-Branch Visual Tracking

Intelligent Optimization Algorithm of 3D Tracking Technology in Football Player Moving Image Analysis

Small Object Tracking in LiDAR Point Cloud: Learning the Target-awareness Prototype and Fine-grained Search Region

Tracking the Soccer Ball Using Multiple Fixed Cameras

Robust Visual Tracking Method via Deep Learning

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos

A Three-Level Scheme For Real-Time Ball Tracking

Learning Adaptive Spatio-Temporal Inference Transformer for Coarse-to-Fine Animal Visual Tracking: Algorithm and Benchmark