Real-time Monocular 3D People Localization and Tracking on Embedded System

Yipeng Zhu,Tao Wang,Shiqiang Zhu
DOI: https://doi.org/10.1109/icarm52023.2021.9536118
2021-01-01
Abstract:Localizing people in 3D space, rather than in original 2D image plane, provides a more comprehensive understanding of the scene and brings up more potential applications. However, inferring 3D locations usually requires stereo camera or additional sensors since deriving depth information from single image is regarded as an ill-posed problem. With recent progress in deep learning methods, depth estimation neural network can provide convincing depth map by a single RGB image. This work develops a people localization and tracking method based on a monocular camera. Specifically, an efficient self-supervised monocular depth estimation method is adopted to generate pseudo depth map. Afterwards, 2D object detection results are adopted for finding accurate people location. Finally, a filter based tracking method is adopted to fuse temporal information and improve the accuracy. Aiming to provide a real time solution for people tracking on embedded system, our methods are deployed and tested on a NVIDIA Jetson Xavier NX develop kit. The proposed efficient localization and tracking method is validated by a group of field tests. The overall performance reaches 12 fps with an acceptable accuracy compared to ground truth.
What problem does this paper attempt to address?