Abstract:Crowd counting is a concerned yet challenging task in computer vision. The difficulty is particularly pronounced by scale variations in crowd images. Most state-of-art approaches tackle the multi-scale problem by adopting multicolumn CNN architectures where different columns are designed with different filter sizes to adapt to variable pedestrian/object sizes. However, the structure is bloated and inefficient, and it is infeasible to adopt multiple deep columns due to the huge resource cost. We instead propose a Scale Pyramid Network (SPN) which adopts a shared single deep column structure and extracts multi-scale information in high layers by Scale Pyramid Module. In Scale Pyramid Module, we specifically employ different rates of dilated convolutions in parallel instead of traditional convolutions with different sizes. Compared to other methods of coping with scale issues, our single column structure with Scale Pyramid Module can get more accurate estimation with simpler structure and less complexity of training. And our Scale Pyramid Module can be easily applied to a deep network. Experimental results on four datasets show that our method achieves state-of-the-art performance. On Shanghai-Tech Part A dataset which is challenging for its highly congested scenes and scale variation, we achieve 9.5% lower MAE and 13.5% lower MSE than the previous state-of-the-art method. We also extend our model on TRANCOS vehicle counting dataset and significantly achieve 5.9% lower GAME(0), 10% lower GAME(1), 24.5% lower GAME(2), 38.7% lower GAME(3) than the previous state-of-the-art method. The experimental results prove the robustness of our model for crowd counting, especially with scale variations.

Counting Varying Density Crowds Through Density Guided Adaptive Selection CNN and Transformer Estimation

Relevant Region Prediction for Crowd Counting

Scale Pyramid Network For Crowd Counting

Multi-branch Progressive Embedding Network for Crowd Counting

Attention Scaling For Crowd Counting

Attend to Count: Crowd Counting with Adaptive Capacity Multi-Scale CNNs.

Density-Aware Multi-Task Learning for Crowd Counting

Transformer-CNN Hybrid Network for Crowd Counting

Concise Convolutional Neural Network for Crowd Counting

LEVERAGE MULTI-SCALE DILATED CONVOLUTIONAL NEURAL NETWORK WITH GLOBAL ATTENTION FEATURE FUSION FOR CROWD COUNTING

CLDE-Net: crowd localization and density estimation based on CNN and transformer network

CCTrans: Simplifying and Improving Crowd Counting with Transformer

CrowdTrans: Learning top-down visual perception for crowd counting by transformer

An encoder-decoder network for crowd counting based on multi-scale attention mechanism

Crowd Counting with Density Adaption Networks

Crowd Transformer Network

CACrowdGAN: Cascaded Attentional Generative Adversarial Network for Crowd Counting

Dual-branch counting method for dense crowd based on self-attention mechanism

Cascade-guided multi-scale attention network for crowd counting

Crowd density estimation based on multi scale features fusion network with reverse attention mechanism

Crowd Counting Method Based on Convolutional Neural Network with Global Density Feature