Abstract:To alleviate the heavy annotation burden for training a reliable crowd counting model and thus make the model more practicable and accurate by being able to benefit from more data, this paper presents a new semi-supervised method based on the mean teacher framework. When there is a scarcity of labeled data available, the model is prone to overfit local patches. Within such contexts, the conventional approach of solely improving the accuracy of local patch predictions through unlabeled data proves inadequate. Consequently, we propose a more nuanced approach: fostering the model's intrinsic 'subitizing' capability. This ability allows the model to accurately estimate the count in regions by leveraging its understanding of the crowd scenes, mirroring the human cognitive process. To achieve this goal, we apply masking on unlabeled data, guiding the model to make predictions for these masked patches based on the holistic cues. Furthermore, to help with feature learning, herein we incorporate a fine-grained density classification task. Our method is general and applicable to most existing crowd counting methods as it doesn't have strict structural or loss constraints. In addition, we observe that the model trained with our framework exhibits a 'subitizing'-like behavior. It accurately predicts low-density regions with only a 'glance', while incorporating local details to predict high-density regions. Our method achieves the state-of-the-art performance, surpassing previous approaches by a large margin on challenging benchmarks such as ShanghaiTech A and UCF-QNRF. The code is available at: <a class="link-external link-https" href="https://github.com/cha15yq/MRC-Crowd" rel="external noopener nofollow">this https URL</a>.

Scene-Aware Ensemble Learning for Robust Crowd Counting

Multi-branch Progressive Embedding Network for Crowd Counting

Cross-Scene Crowd Counting Via Deep Convolutional Neural Networks

LEVERAGE MULTI-SCALE DILATED CONVOLUTIONAL NEURAL NETWORK WITH GLOBAL ATTENTION FEATURE FUSION FOR CROWD COUNTING

Meta-Knowledge and Multi-Task Learning-Based Multi-Scene Adaptive Crowd Counting

Towards A Universal Model for Cross-Dataset Crowd Counting

Adaptive Context Learning Network for Crowd Counting.

Attend to Count: Crowd Counting with Adaptive Capacity Multi-Scale CNNs.

Unlabeled scene adaptive crowd counting via meta-ensemble learning

Crowd Counting for Real Monitoring Scene

Scene Adaptive Segmentation for Crowd Counting in Population Heterogeneous Distribution

End-to-end Crowd Counting Via Joint Learning Local and Global Count

COMAL: compositional multi-scale feature enhanced learning for crowd counting

Crowd Counting Via Adversarial Cross-Scale Consistency Pursuit

SCLNet: Spatial context learning network for congested crowd counting

Density-Aware Multi-Task Learning for Crowd Counting

Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes

Learning Discriminative Features for Crowd Counting

Weakly-Supervised Scene-Specific Crowd Counting Using Real-Synthetic Hybrid Data

Attention Scaling For Crowd Counting