MCNet: A crowd denstity estimation network based on integrating multiscale attention module

Qiang Guo,Rubo Zhang,Di Zhao
2024-03-29
Abstract:Aiming at the metro video surveillance system has not been able to effectively solve the metro crowd density estimation problem, a Metro Crowd density estimation Network (called MCNet) is proposed to automatically classify crowd density level of passengers. Firstly, an Integrating Multi-scale Attention (IMA) module is proposed to enhance the ability of the plain classifiers to extract semantic crowd texture features to accommodate to the characteristics of the crowd texture feature. The innovation of the IMA module is to fuse the dilation convolution, multiscale feature extraction and attention mechanism to obtain multi-scale crowd feature activation from a larger receptive field with lower computational cost, and to strengthen the crowds activation state of convolutional features in top layers. Secondly, a novel lightweight crowd texture feature extraction network is proposed, which can directly process video frames and automatically extract texture features for crowd density estimation, while its faster image processing speed and fewer network parameters make it flexible to be deployed on embedded platforms with limited hardware resources. Finally, this paper integrates IMA module and the lightweight crowd texture feature extraction network to construct the MCNet, and validate the feasibility of this network on image classification dataset: Cifar10 and four crowd density datasets: PETS2009, Mall, QUT and SH_METRO to validate the MCNet whether can be a suitable solution for crowd density estimation in metro video surveillance where there are image processing challenges such as high density, high occlusion, perspective distortion and limited hardware resources.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the issue of ineffective estimation of metro passenger density in metro video surveillance systems. Specifically, the paper proposes a network called MCNet (Metro Crowd density estimation Network) for automatically classifying passenger density levels. The paper mainly focuses on the following two aspects: 1. **Extracting Multi-scale Crowd Texture Features**: Traditional methods struggle to effectively extract crowd texture features in complex scenarios such as high density, high occlusion, and perspective distortion. Therefore, the paper proposes an Integrated Multi-scale Attention module (IMA) that combines dilated convolution, multi-scale feature extraction, and attention mechanisms to enhance the network's ability to extract crowd features. 2. **Reducing Computational Burden**: Existing crowd density estimation methods based on deep convolutional neural networks (CNN) require high computational resources, making it difficult to deploy on embedded devices. The paper designs a lightweight crowd texture feature extraction network that can operate on resource-limited embedded platforms while maintaining high prediction accuracy and inference speed. Through these innovations, MCNet not only performs well on multiple benchmark datasets but also demonstrates its practical application potential in metro scenarios, capable of real-time and accurate estimation of passenger density in metro carriages and platforms, providing decision support for metro management personnel.