Abstract:Multi-object tracking (MOT) in the scenario of low-frame-rate videos is a promising solution to better meet the computing, storage, and transmitting bandwidth resource constraints of edge devices. Tracking with a low frame rate poses particular challenges in the association stage as objects in two successive frames typically exhibit much quicker variations in locations, velocities, appearances, and visibilities than those in normal frame rates. In this paper, we observe severe performance degeneration of many existing association strategies caused by such variations. Though optical-flow-based methods like CenterTrack can handle the large displacement to some extent due to their large receptive field, the temporally local nature makes them fail to give reliable displacement estimations of objects that newly appear in the current frame (i.e., not visible in the previous frame). To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. Further, to capture the fine-grained and private unreliability of each displacement estimation, we extend the binary APP predictions to displacement uncertainties. To this end, we reformulate the displacement estimation task via Bayesian deep learning tools. With APP predictions, we propose to conduct association in a multi-stage manner where vision cues or historical motion cues are leveraged in the corresponding stage. By rethinking the commonly used bipartite matching algorithms, we equip the proposed multi-stage association policy with a hybrid matching strategy conditioned on displacement uncertainties. Our method shows robustness in preserving identities in low-frame-rate video sequences. Experimental results on public datasets in various low-frame-rate settings demonstrate the advantages of the proposed method.

Tracking Generic Human Motion Via Fusion of Low- and High-Dimensional Approaches

Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking

Tracking an Object over 200 FPS with the Fusion of Prior Probability and Kalman Filter

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

APPTracker Plus : Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking

Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

Multiview 3-D Human Motion Tracking with Information Fusion Sampling

Motion-Driven Tracking via End-to-End Coarse-to-Fine Verifying

APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos

Simultaneous 3-D Human-Motion Tracking and Voxel Reconstruction

Human motion tracking by temporal-spatial local gaussian process experts

HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs

DFSTrack: Dual-stream fusion Siamese network for human pose tracking in videos

Learning-based Tracking of Complex Non-Rigid Motion

Long-Term 3D Point Tracking By Cost Volume Fusion

Real-time tracking based on deep feature fusion

Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible Sensors

A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals

Multi-object Tracking Via MHT with Multiple Information Fusion in Surveillance Video

Human Motion Tracking by Multiple RGBD Cameras.