Abstract:Crowd counting and localization have become increasingly important in computer vision due to their wide-ranging applications. While point-based strategies have been widely used in crowd counting methods, they face a significant challenge, i.e., the lack of an effective learning strategy to guide the matching process. This deficiency leads to instability in matching point proposals to target points, adversely affecting overall performance. To address this issue, we introduce an effective approach to stabilize the proposal-target matching in point-based methods. We propose Auxiliary Point Guidance (APG) to provide clear and effective guidance for proposal selection and optimization, addressing the core issue of matching uncertainty. Additionally, we develop Implicit Feature Interpolation (IFI) to enable adaptive feature extraction in diverse crowd scenarios, further enhancing the model's robustness and accuracy. Extensive experiments demonstrate the effectiveness of our approach, showing significant improvements in crowd counting and localization performance, particularly under challenging conditions. The source codes and trained models will be made publicly available.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address a core issue in point-based crowd counting and localization methods, namely the **instability of proposal-target matching**. Specifically, existing point-based methods suffer from unstable matching relationships between proposals and target points during training due to the lack of effective learning strategies to guide the matching process. This instability makes it difficult for the model to make accurate decisions during optimization, thereby affecting overall performance. ### Background and Challenges 1. **Importance of Crowd Counting and Localization**: - Crowd counting and localization have become increasingly important in computer vision, with wide applications in surveillance, event management, and urban planning. - Accurately estimating crowd size and identifying individual locations face numerous challenges, such as crowd density fluctuations, occlusions, and environmental changes. 2. **Limitations of Existing Methods**: - **Map-based Methods**: Generate density maps using Gaussian kernels, which perform well in crowd counting but tend to overlap in dense areas and require multi-scale representations, making precise localization difficult. - **Detection-based Methods**: Achieve localization by generating pseudo ground truth bounding boxes, but their accuracy is limited in highly crowded and sparse areas, often requiring complex post-processing steps. - **Point-based Methods**: Directly use point annotations as learning targets, simplifying the localization process but suffering from the instability of proposal-target matching, leading to underestimation or overestimation in local areas. ### Solutions 1. **Auxiliary Point Guidance (APG)**: - Introduces an explicit guidance mechanism by generating auxiliary positive and negative points around each real point to enhance the stability of the matching process. - The confidence of auxiliary positive points should be close to 1, and the offset should match the added random number; the confidence of auxiliary negative points should be close to 0, and the offset should be close to 0. - Through these auxiliary points, the network can more effectively select and optimize proposals, ensuring that each real point consistently selects the same positive point during the matching process. 2. **Implicit Feature Interpolation (IFI)**: - Proposes an implicit feature interpolation method to extract features at any location by interpolating using the nearest four feature maps and their distances to generate features at the target location. - This method improves the robustness and accuracy of the model in various crowd scenarios. ### Experimental Results - **Crowd Counting**: Extensive experiments on multiple datasets show that APGCC significantly outperforms existing methods in terms of MAE and MSE metrics, especially in complex scenarios. - **Crowd Localization**: In localization tasks, APGCC also performs excellently, particularly in terms of F1 score, precision, and recall at both small and large thresholds. ### Conclusion By introducing auxiliary point guidance and implicit feature interpolation methods, this paper effectively addresses the instability of proposal-target matching in point-based crowd counting and localization methods, significantly enhancing the accuracy and robustness of the model.

Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance

Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework

Multi-branch Progressive Embedding Network for Crowd Counting

Counting moving people in crowds using motion statistics of feature-points

Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting

Beyond Counting: Point Supervised Attention Guided Neural Network for Crowded Object Locating

Locality-Aware Crowd Counting

Point-Query Quadtree for Crowd Counting, Localization, and More

Crowd Counting Based on Multiscale Spatial Guided Perception Aggregation Network

A Crowd Counting and Localization Network Based on Adaptive Feature Fusion and Multi-Scale Global Attention Up Sampling

Learning to Count via Unbalanced Optimal Transport

PATS: Patch Area Transportation with Subdivision for Local Feature Matching.

Learning Discriminative Features for Crowd Counting

A Self-Training Approach for Point-Supervised Object Detection and Counting in Crowds

Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM

Learning Independent Instance Maps for Crowd Localization

Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization

Counting Crowds in Bad Weather

PANet: Perspective-Aware Network with Dynamic Receptive Fields and Self-Distilling Supervision for Crowd Counting

Perspective-Guided Convolution Networks for Crowd Counting