Double multi-scale feature fusion network for crowd counting
Qian Liu,Jiongtao Fang,Yixiong Zhong,Cunbao Wang,Youwei Qi,Liu, Qian,Fang, Jiongtao,Zhong, Yixiong,Wang, Cunbao,Qi, Youwei
DOI: https://doi.org/10.1007/s11042-024-18769-w
IF: 2.577
2024-03-08
Multimedia Tools and Applications
Abstract:Recently, the research of crowd counting has attracted increasing attention but still faces many challenges, such as crowded scenes, scale variations and cluttered backgrounds. With the development of deep learning, density maps are widely used for crowd counting, where the quality of density maps plays a crucial role in counting performance. In this paper, we propose a new convolutional network architecture, called double multi-scale feature fusion network (DMFFNet), to generate high-quality density maps and accurate counting estimates. DMFFNet utilizes VGG19 to extract multi-scale feature maps from input images. The features from last three scales are further enlarged the receptive fields by three designed dilated feature pyramid modules, and then fused together. Moreover, a feature enhancement module composed of spatial attention and channel-wise attention is presented to weight the fused feature maps for effectively distinguishing between crowd and background. We also design a new dual-scale loss to optimize the network during training. Experimental results show that DMFFNet reduces MAEs by at least 1.5 , 1.5 , 1.2 , 0.6 and 0.5 on UCF CC 50, UCF-QNRF, JHU-Crowd++, ShanghaiTech Part A and Part B datasets, and decreases MSEs by at least 1.8 and 0.1 on JHU-Crowd++ and ShanghaiTech Part B datasets, as compared with the state-of-the-art.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering