RT3C: Real-Time Crowd Counting in Multi-Scene Video Streams Via Cloud-Edge-Device Collaboration

Rui Wang,Yixue Hao,Yiming Miao,Long Hu,Min Chen
DOI: https://doi.org/10.1109/tsc.2024.3377156
IF: 11.019
2024-01-01
IEEE Transactions on Services Computing
Abstract:Recently, the advancements in edge computing have boosted the deployment of video analysis systems based on deep learning, which breaks the limitation of the constrained communication and computing resources of local devices. However, processing multi-scene high-resolution video streams in crowd surveillance remains a significant challenge since it is difficult to formulate dynamic video content and communication environments to support offloading decisions. To bridge the gap between applications and modeling, this paper presents a R eal- T ime C loud-edge-device C ollaboration framework, which enables fast and accurate C rowd counting (RT3C) on the real dataset. RT3C comprises key frame detection, adaptive patch partition, patch encoder and decoder and computation offloading decision, designed to divide key frames into a minimum number of patches and determine the offloading location of patches. A Real-Time Multi-Agent Actor-Critic (RTMAAC) algorithm based on multi-agent reinforcement learning is proposed to decide whether to compute patches with a lightweight model on edge or a large model on cloud. Unlike traditional approaches ignoring the contents, RTMAAC is a dynamic online decision algorithm based on context of the network and video. Extensive experiments demonstrate that RT3C effectively discriminate the valid frames and optimizes offloading decisions in complex environments, outperforming other baseline algorithms on the two crowd counting datasets. In summary, RT3C provides a promising framework for multi-scene video streams, which can be extended to other applications to realize video computation based on deep models.
What problem does this paper attempt to address?