Abstract:Crowd counting is an important research topic in the fields of computer vision and image processing, with monitoring and management of crowded scenes becoming an increasingly prominent issue. Existing methods still suffer from the problem of severe overlap in density maps within dense areas, leading to inadequate counting and localization accuracy. This paper presents innovative research on crowd counting and localization. Firstly, addressing the limitations of density maps in localization performance in existing algorithms, we optimize the generation method of FIDT maps, decoupling the counting and localization tasks. By avoiding the problem of overlap in dense areas, the optimized label maps achieve a good balance between counting accuracy and localization, with MAE and MSE reaching 64.1 and 103.9 in SHHA, and 10.9 and 17.4 in SHHB, respectively.Secondly, to address the scale insensitivity of the encoder and the potential loss of critical features during the encoding process, we propose the Adaptive Feature Fusion Module and the Multi-Scale Global Attention Upsampling Module, constructing the CALNET network. By reducing redundant features inside and outside the separable branch, the model achieves global fusion of shallow features during the decoding process. The F1-m scores obtained on the SHHA and SHHB datasets reach 72.9% and 79.4% respectively, significantly improving the model's performance.Finally, this paper extends the application of crowd counting and localization algorithms to different domains such as citrus orchards, vehicles, and campus crowds. Through experiments, the robustness and transferability of the network are validated, expanding the application areas of crowd counting and localization algorithms and providing a broader space for future research.

Heterogeneous Dual-Attentional Network for WiFi and Video-Fused Multi-modal Crowd Counting

Multi-branch Progressive Embedding Network for Crowd Counting

Crowd Counting in Large Surveillance Areas by Fusing Audio and WiFi Sniffing Data

Adaptive Scheme for Crowd Counting Using Off-the-shelf Wireless Routers

LEVERAGE MULTI-SCALE DILATED CONVOLUTIONAL NEURAL NETWORK WITH GLOBAL ATTENTION FEATURE FUSION FOR CROWD COUNTING

Toward Accurate Crowd Counting in Large Surveillance Areas Based on Passive WiFi Sensing

A Crowd Counting and Localization Network Based on Adaptive Feature Fusion and Multi-Scale Global Attention Up Sampling

Multi-modal Crowd Counting via Modal Emulation

Crowd counting method based on the self-attention residual network

Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting

Dual-branch counting method for dense crowd based on self-attention mechanism

3D Crowd Counting via Geometric Attention-guided Multi-View Fusion

Motional foreground attention-based video crowd counting

MHANet: Multi-scale Hybrid Attention Network for Crowd Counting.

A Unified Multi-Task Learning Framework of Real-Time Drone Supervision for Crowd Counting

MLANet: multi-level attention network with multi-scale feature fusion for crowd counting

Crowd density estimation based on multi scale features fusion network with reverse attention mechanism

Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

Double multi-scale feature fusion network for crowd counting

Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization

Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in Large Scenes