CrowdTrans: Learning top-down visual perception for crowd counting by transformer

Weiyu Guo,Shaopeng Yang,Yuheng Ren,Yongzhen Huang
DOI: https://doi.org/10.2139/ssrn.4706202
IF: 6
2024-04-05
Neurocomputing
Abstract:Recent advancements in crowd counting methods have relied on density maps as an intermediary representation for counting, whereby the ground truth of the density map is obtained through the convolution of dot annotations with a fixed Gaussian kernel. However, the presence of perspective phenomena introduces scale variations among targets, leading to a significant challenge in scene generalization. Existing approaches suffer from limitations in accommodating a limited number of scales within the density map generation and prediction processes. In order to address this problem, we introduce a novel transformer network, CrowdTrans, which incorporates a two-channel tasks-based density map estimator and generator. This innovative approach learns a density map by leveraging both pixel-wise classification and regression. Furthermore, we devise an end-to-end framework that facilitates the joint learning of the density map estimator and the corresponding label generator. Through extensive experimentation on widely utilized datasets, our results demonstrate the state-of-the-art performance of our proposed method, thus validating the effectiveness of our novel designs.
computer science, artificial intelligence
What problem does this paper attempt to address?