SRNet: Scale-Aware Representation Learning Network for Dense Crowd Counting

Liangjun Huang,Luning Zhu,Shihui Shen,Qing Zhang,Jianwei Zhang
DOI: https://doi.org/10.1109/access.2021.3115963
IF: 3.9
2021-01-01
IEEE Access
Abstract:Huge variations in the scales of people in images create an extremely challenging problem in the task of crowd counting. Currently, many researchers apply multi-column structures to solve the scale variation problem. However, multi-column structures usually have complex structures with large numbers of parameters and are difficult to optimize. To this end, we propose a scale-aware representation learning network (SRNet) that uses a commonly used encoder-decoder framework. An image is converted into deep features by the first ten layers of VGG16 in the encoder. Then, the features are regressed to a crowd density map via the decoder. The decoder mainly consists of two modules: the scale-aware feature learning module (SAM) and the pixel-aware upsampling module (PAM). SAM models the multi-scale features of a crowd at each level with different sizes of receptive fields, and PAM enlarges the spatial resolution and enhances the pixel-level semantic information, thereby improving the overall counting accuracy. We conduct extensive crowd counting experiments on ShanghaiTech Part_A, UCF-QNRF, and UCF_CC_50 datasets. Furthermore, to obtain the locations of each person, we conduct crowd localization experiments on UCF-QNRF and NWPU-Crowd datasets. The qualitative and quantitative results prove the effectiveness of the SRNet in dense crowd counting and crowd localization tasks.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?