Multi-Scale Infrared Pedestrian Detection Based on Deep Attention Mechanism

Zhao Bin,Wang Chunping,Fu Qiang,Chen Yichao
DOI: https://doi.org/10.3788/aos202040.0504001
2020-01-01
Acta Optica Sinica
Abstract:In this paper, for multi-scale target detection, a multi-scale infrared pedestrian detection method based on deep attention mechanism is proposed. The lightweight Darknet53 is adopted as the backbone network for deep convolutional features extracting, and a four-scale feature pyramid network is constructed to classify and localize objects. The detection performance with respect to small-scale objects such as pedestrians is improved by introducing low-level and high-resolution feature maps. Furthermore, an attention module is designed to replace the traditional upsampling block in the feature pyramid network, which generate local saliency map based on convolution feature, thus suppress the feature responses of unrelated areas and highlight the local feature of the image. Finally, the Caltech pedestrian and U-FOV infrared pedestrian datasets arc used to execute two-step transfer learning to ensure the generalization of the proposed model and improve the pedestrian features. The results show that the average precision of the proposed method is 93.45% on the U-FOV dataset, which is 26.74 percentage higher than that obtained using YOLOv3, and the minimum pixel size of the pedestrian that can be detected is 6 x 13. In addition, the qualitative experiment results obtained using the LTIR dataset validate the good generalization of the proposed model, which makes it suitable for multi-scale infrared pedestrian detection.
What problem does this paper attempt to address?