MSNet: Multi-scale Network for Crowd Counting

Ying Shi,Jun Sang,Mohammad S. Alam,Xinyue Liu,Shaoli Tian
DOI: https://doi.org/10.1117/12.2592677
2021-01-01
Abstract:Nowadays, due to various challenges such as large-scale variation of population, mutual occlusion, perspective distortion and so on, crowd counting has gradually become a hot issue in computer vision. To address the large-scale variation exists in the images, in this paper, we propose a novel multi-scale network called MSNet which aims to maintain continuous variations and count the number of pedestrians accurately. While most state-of-the-arts multi-scale and multi-column networks aim to integrate the scale information of heads with different size, lots of researches still need to do to achieve continuous variations. In MSNet, specifically, the first ten layers of the visual geometry group network(VGG) are used as the backbone to extract the rough features of images and a multi-scale block is employed to maintain the scale information which contains several receptive kernels to obtain a better performance towards the difficulty of scale-variation. Inspired by the knowledge that using multiple small receptive field kernels to replace a single large receptive field will get a better performance, we utilize two dilated convolutions with the receptive field of 5 to replace the large kernel. Our MSNet has moderate increase in computation, and we evaluate our method on three benchmark datasets including ShanghaiTech (Part A: MAE-59.6, RMSE=96.1; Part B: MAE-7.5, RMSE-12.1), UCF-CC-50(MAE-207.9, RMSE=273.8) and UCF-QNRF(MAE-93, RMSE=158) to show the outperformance of our method.
What problem does this paper attempt to address?