Abstract:Crowd counting is an important research topic in the fields of computer vision and image processing, with monitoring and management of crowded scenes becoming an increasingly prominent issue. Existing methods still suffer from the problem of severe overlap in density maps within dense areas, leading to inadequate counting and localization accuracy. This paper presents innovative research on crowd counting and localization. Firstly, addressing the limitations of density maps in localization performance in existing algorithms, we optimize the generation method of FIDT maps, decoupling the counting and localization tasks. By avoiding the problem of overlap in dense areas, the optimized label maps achieve a good balance between counting accuracy and localization, with MAE and MSE reaching 64.1 and 103.9 in SHHA, and 10.9 and 17.4 in SHHB, respectively.Secondly, to address the scale insensitivity of the encoder and the potential loss of critical features during the encoding process, we propose the Adaptive Feature Fusion Module and the Multi-Scale Global Attention Upsampling Module, constructing the CALNET network. By reducing redundant features inside and outside the separable branch, the model achieves global fusion of shallow features during the decoding process. The F1-m scores obtained on the SHHA and SHHB datasets reach 72.9% and 79.4% respectively, significantly improving the model's performance.Finally, this paper extends the application of crowd counting and localization algorithms to different domains such as citrus orchards, vehicles, and campus crowds. Through experiments, the robustness and transferability of the network are validated, expanding the application areas of crowd counting and localization algorithms and providing a broader space for future research.

A Dynamic-Attention On Crowd Region With Physical Optical Flow Features For Crowd Counting

Relevant Region Prediction for Crowd Counting

Multi-branch Progressive Embedding Network for Crowd Counting

Spatial-Frequency Attention Network for Crowd Counting

Attention Scaling For Crowd Counting

Motional foreground attention-based video crowd counting

LEVERAGE MULTI-SCALE DILATED CONVOLUTIONAL NEURAL NETWORK WITH GLOBAL ATTENTION FEATURE FUSION FOR CROWD COUNTING

$$\hbox {DA}^2$$Net: a dual attention-aware network for robust crowd counting

Encoder-Decoder Network Fusing Channel and Spatial Attention for Crowd Counting

Triple Attention For Robust Video Crowd Counting

Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

A multi-scale fusion and dual attention network for crowd counting

Attentional Neural Fields for Crowd Counting

DDRANet: A Dynamic Density-Region-Aware Network for Crowd Counting

FDCNet: Frontend-Backend Fusion Dilated Network Through Channel-Attention Mechanism

Crowd Counting Using Deep Recurrent Spatial-Aware Network.

A Crowd Counting and Localization Network Based on Adaptive Feature Fusion and Multi-Scale Global Attention Up Sampling

Single-column CNN for crowd counting with pixel-wise attention mechanism

Attentive Encoder-Decoder Networks for Crowd Counting

Attention Guided Region Division for Crowd Counting

Cascade-guided multi-scale attention network for crowd counting