Crowd Counting in Large Surveillance Areas by Fusing Audio and WiFi Sniffing Data

Rui Guo,Baoqi Huang,Lifei Hao,Bing Jia
DOI: https://doi.org/10.1109/ijcnn60899.2024.10651535
2024-01-01
Abstract:Popular vision-based crowd counting methods suffer from huge costs, limited coverage and high complexity, making it difficult to be applied for large surveillance areas, while emerging WiFi-based methods which are suitable for large surveillance areas incur limited accuracy due to the sparsity and randomness of WiFi sniffing data. Considering the fact that the variations of audio data are spatial-temporally correlated with crowd fluctuations, this paper proposes to fuse audio and WiFi sniffing data for crowd counting by developing a Cross-modal Multi-level Perception Network, termed CMPN. The CMPN can not only extract crowd features from the bimodal data to leverage the temporally continuity for compensating sparse WiFi sniffing data, but also mine the correlation of intra- and inter-modality crowd features for accurate crowd counting. Extensive experiments are conducted in a real campus with the surveillance area of about 4000m 2 , and demonstrate that the CMPN can achieve the mean absolute error of 5.88, resulting in a 22.12% reduction compared to the state-of-the-art WiFi-only method.
What problem does this paper attempt to address?