Crowd Transformer Network

Viresh Ranjan,Mubarak Shah,Minh Hoai Nguyen
DOI: https://doi.org/10.48550/arXiv.1904.02774
2019-04-05
Abstract:In this paper, we tackle the problem of Crowd Counting, and present a crowd density estimation based approach for obtaining the crowd count. Most of the existing crowd counting approaches rely on local features for estimating the crowd density map. In this work, we investigate the usefulness of combining local with non-local features for crowd counting. We use convolution layers for extracting local features, and a type of self-attention mechanism for extracting non-local features. We combine the local and the non-local features, and use it for estimating crowd density map. We conduct experiments on three publicly available Crowd Counting datasets, and achieve significant improvement over the previous approaches.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?