Crowd Counting by Multi-Scale Dilated Convolution Networks
Jingwei Dong,Ziqi Zhao,Tongxin Wang
DOI: https://doi.org/10.3390/electronics12122624
IF: 2.9
2023-06-11
Electronics
Abstract:The number of people in a crowd is crucial information in public safety, intelligent monitoring, traffic management, architectural design, and other fields. At present, the counting accuracy in public spaces remains compromised by some unavoidable situations, such as the uneven distribution of a crowd and the difference in head scale caused by people's differing distances from the camera. To solve these problems, we propose a deep learning crowd counting model, multi-scale dilated convolution networks (MSDCNet), based on crowd density map estimation. MSDCNet consists of three parts. The front-end network uses the truncated VGG16 to obtain preliminary features of the input image, with a proposed spatial pyramid pooling (SPP) module replacing the max-pooling layer to extract features with scale invariance. The core network is our proposed multi-scale feature extraction network (MFENet) for extracting features in three different scales. The back-end network consists of consecutive dilation convolution layers instead of traditional alternate convolution and pooling to expand the receptive field, extract high-level semantic information and avoid the spatial feature loss of small-scale heads. The experimental results on three public datasets show that the proposed model solved the above problems satisfactorily and obtained better counting accuracy than representative models in terms of mean absolute error (MAE) and mean square error (MSE).
engineering, electrical & electronic,computer science, information systems,physics, applied