Abstract:Crowd counting, as a basic computer vision task, plays an important role in many fields such as video surveillance, accident prediction, public security, and intelligent transportation. At present, crowd counting tasks face various challenges. Firstly, due to the diversity of crowd distribution and increasing population density, there is a phenomenon of large-scale crowd aggregation in public places, sports stadiums, and stations, resulting in very serious occlusion. Secondly, when annotating large-scale datasets, positioning errors can also easily affect training results. In addition, the size of human head targets in dense images is not consistent, making it difficult to identify both near and far targets using only one network simultaneously. The existing crowd counting methods mainly use density plot regression methods. However, this framework does not distinguish the features between distant and near targets and cannot adaptively respond to scale changes. Therefore, the detection performance in areas with sparse population distribution is not good. To solve such problems, we propose an adaptive multi-scale far and near distance network based on the convolutional neural network (CNN) framework for counting dense populations and achieving a good balance between accuracy, inference speed, and performance. However, on the feature level, in order to enable the model to distinguish the differences between near and far features, we use stacked convolution layers to deepen the depth of the network, allocate different receptive fields according to the distance between the target and the camera, and fuse the features between nearby targets to enhance the feature extraction ability of pedestrians under nearby targets. Secondly, depth information is used to distinguish distant and near targets of different scales and the original image is cut into four different patches to perform pixel-level adaptive modeling on the population. In addition, we add density normalized average precision (nAP) indicators to analyze the accuracy of our method in spatial positioning. This paper validates the effectiveness of NF-Net on three challenging benchmarks in Shanghai Tech Part A and B, UCF_ CC_50, and UCF-QNRF datasets. Compared with SOTA, it has more significant performance in various scenarios. In the UCF-QNRF dataset, it is further validated that our method effectively solves the interference of complex backgrounds.

Semantic Heads Segmentation and Counting in Crowded Retail Environment with Convolutional Neural Networks Using Top View Depth Images

Semantic-refined Spatial Pyramid Network for Crowd Counting

Multi-branch Progressive Embedding Network for Crowd Counting

Scale and density invariant head detection deep model for crowd counting in pedestrian crowds

Deep Spatial Regression Model for Image Crowd Counting

CCCNet: An Attention Based Deep Learning Framework for Categorized Crowd Counting

An Improved Dilated Convolutional Network for Herd Counting in Crowded Scenes

An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting

Scale-Aware Crowd Counting via Depth-Embedded Convolutional Neural Networks

HADF-Crowd: A Hierarchical Attention-Based Dense Feature Extraction Network for Single-Image Crowd Counting

LEVERAGE MULTI-SCALE DILATED CONVOLUTIONAL NEURAL NETWORK WITH GLOBAL ATTENTION FEATURE FUSION FOR CROWD COUNTING

Benchmark Data and Method for Real-Time People Counting in Cluttered Scenes Using Depth Sensors

Deep Residual Convolution Neural Network for Single-Image Robust Crowd Counting

Body Structure Aware Deep Crowd Counting.

Dynamic Kernel CNN-LR model for people counting

Semantic-guided RGB-Thermal Crowd Counting with Segment Anything Model

Crowd Counting by Multi-Scale Dilated Convolution Networks

SRNet: Scale-Aware Representation Learning Network for Dense Crowd Counting

Activity detection and counting people using Mask-RCNN with bidirectional ConvLSTM

A Survey on Deep Learning-based Single Image Crowd Counting: Network Design, Loss Function and Supervisory Signal

An effective modular approach for crowd counting in an image using convolutional neural networks