Abstract:Multi-object tracking (MOT) in the scenario of low-frame-rate videos is a promising solution to better meet the computing, storage, and transmitting bandwidth resource constraints of edge devices. Tracking with a low frame rate poses particular challenges in the association stage as objects in two successive frames typically exhibit much quicker variations in locations, velocities, appearances, and visibilities than those in normal frame rates. In this paper, we observe severe performance degeneration of many existing association strategies caused by such variations. Though optical-flow-based methods like CenterTrack can handle the large displacement to some extent due to their large receptive field, the temporally local nature makes them fail to give reliable displacement estimations of objects that newly appear in the current frame (i.e., not visible in the previous frame). To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. Further, to capture the fine-grained and private unreliability of each displacement estimation, we extend the binary APP predictions to displacement uncertainties. To this end, we reformulate the displacement estimation task via Bayesian deep learning tools. With APP predictions, we propose to conduct association in a multi-stage manner where vision cues or historical motion cues are leveraged in the corresponding stage. By rethinking the commonly used bipartite matching algorithms, we equip the proposed multi-stage association policy with a hybrid matching strategy conditioned on displacement uncertainties. Our method shows robustness in preserving identities in low-frame-rate video sequences. Experimental results on public datasets in various low-frame-rate settings demonstrate the advantages of the proposed method.

A Cross Frame Post-Processing Strategy for Video Object Detection.

Foreground Gating and Background Refining Network for Surveillance Object Detection

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

APPTracker Plus : Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking

Optical-flow-based framework to boost video object detection performance with object enhancement

Fast video shot boundary detection framework employing pre-processing techniques

Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection

Tracking Assisted Faster Video Object Detection

Accelerating real‐time object detection in high‐resolution video surveillance

Target detection for remote sensing based on the enhanced YOLOv4 with improved BiFPN

Impression Network for Video Object Detection

Exploiting Detected Visual Objects for Frame-Level Video Filtering

FFAVOD: Feature fusion architecture for video object detection

Practical Video Object Detection via Feature Selection and Aggregation

Context-Aware Video Object Proposals.

A Dynamic Frame Selection Framework for Fast Video Recognition.

An Efficient Object Detection Framework with Modified Dense Connections for Small Objects Optimizations

MPF-Net: multi-projection filtering network for few-shot object detection

Single Shot Video Object Detector