Abstract:Nowadays, with more and more video surveillance systems constructed in our society, pedestrian tracking has become an important issue that has always been discussed in the computer vision domain all over the world. How to propose a real-time tracking system with great accuracy has always been the core of this problem, it’s a bit of a paradox because you want to have a precise detection result and a real-time tracking system at the same time, but we know the best method to make target detection is a deep neural network, which is time-consuming to process immense amounts of data. And to solve the ID-switches problem, we introduce the DeepSort algorithm to make the tracking process done. So in this paper, we propose a real-time tracking system using top-view depth data by integrating the newest YOLOv5 with DeepSort that can achieve nearly 40 frames per second of a high-quality video stream. And the orientation of our camera is top-view, which can help the neural network distinguish a bunch of people with occlusion easily, at the same time we choose depth data to avoid privacy leaks problem. Once we have an accurate detection result then we use the Kalman filter and Hungarian algorithm and matching cascade to handle the matching process of multiple detection and tracks. To authenticate the surveillance system we proposed, we conducted a few experiments on different datasets that meet our data requirements, we also recorded two datasets by various cameras in our laboratory and outdoor environment. In addition, the results showed the superior advantages of top-view depth data in tracking by the detection system and improved the tracking accuracy to 99.3% which is the best mAP@0.5 of alike methods. And all experiments conducted on different video streams can reach a real-time level and verify the effectiveness of this system as well.

An End-to-end Tracking Framework Via Multi-View and Temporal Feature Aggregation

Online Multipedestrian Tracking Based on Fused Detections of Millimeter Wave Radar and Vision

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

Adaptive Multi-Pedestrian Tracking by Multi-Sensor: Track-to-Track Fusion Using Monocular 3D Detection and MMW Radar

A Top-View Multiple People Tracking System Based on Newest YOLOv5 and DeepSort Using Depth Data

Multi-person Multi-Camera Tracking for Live Stream Videos Based on Improved Motion Model and Matching Cascade

Lifting Multi-View Detection and Tracking to the Bird's Eye View

Vision Based Multi-pedestrian Tracking Using Adaptive Detection and Clustering.

Multiple object tracking with appearance feature prediction and similarity fusion

A Deep Top-Down Framework Towards Generalisable Multi-View Pedestrian Detection

Multi-view Aggregation for Real-Time Accurate Object Detection of a Moving Camera

Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection

Pedestrian Detection with Multi-View Convolution Fusion Algorithm

Multi-Pedestrian Tracking with Clusters

Online Multi-target Tracking for Pedestrian by Fusion of Millimeter Wave Radar and Vision

A novel multi-target multi-camera tracking approach based on feature grouping

Effective multiple pedestrian tracking system in video surveillance with monocular stationary camera

Improving Multiple Pedestrian Tracking in Crowded Scenes with Hierarchical Association

3D Multi-Object Online Tracking with Multi-View Clustering

3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization