Semantic-refined Spatial Pyramid Network for Crowd Counting

Lifang Zhou,Peiwen Wang,Weisheng Li,Jiaxu Leng,Bangjun Lei
DOI: https://doi.org/10.1016/j.patrec.2022.04.029
IF: 4.757
2022-01-01
Pattern Recognition Letters
Abstract:In this paper, we propose a novel encoder-decoder model called Semantic-refined Spatial Pyramid Network (SSPNet) for generating high-quality density maps, which aims to build a scale-aware counting network to estimate the number of crowds accurately. The SSPNet consists of the front-end based on VGG-16, spatial pyramid multi-scale module (SPMM), and semantic enhancement module (SEM). First, a series of convolutional neural layers are utilized as the front-end to get deeper features without the extra computational cost. Moreover, the SPMM, which has a spatial pyramid structure with multiple receptive fields, is employed to capture multi-scale features. Furthermore, the SEM is designed to refine the features captured by SPMM, which uses deep semantic information to better integrate multi-scale features. Finally, the shallow texture information is adopted to compensate for the detail of the feature map to enhance the quality of the density map. Extensive experiments and comparisons on three challenge datasets, including ShanghaiTech Part_A & Part_B, UCF_CC_50, and UCF-QNRF, illustrate the superiority of our method. (C) 2022 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?