DRENet: Giving Full Scope to Detection and Regression-Based Estimation for Video Crowd Counting

Changsheng Liu,Yuan Huang,Yadong Mu,Xiaoming Yu
DOI: https://doi.org/10.1007/978-3-030-86340-1_2
2021-01-01
Abstract:Currently existing deep learning-based video crowd counting methods mainly involve leveraging the temporal correlation to improve the model. Despite their comparable results, most of these counting methods disregard the fact that crowd density varies enormously in the spatial and temporal domains of videos. This thus hinders the improvement in performance of video crowd counting. To overcome that issue, a new detection and regression estimation network, named DRENet, is proposed, which starts with estimating the crowd density by generating a video object detection-, and a mixed 3D-2D convolution-based (regression-based) density maps separately, in which the detection- and regression-based methods function well in sparse and congested scenes, respectively. Moreover, a multi-column attention-based fusion block is proposed to perceive the crowd density in a frame, and to adaptively allocate the relative weights for the video detection- and regression-based estimations. Furthermore, the optimal crowd counts are obtained with guidance from the attention block. The experimental results demonstrate that our method achieves state-of-the-art performance on three public video crowd counting datasets.
What problem does this paper attempt to address?