SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

Hongyang Wei,Qianqian Zhang,Jingjing Han,Yingying Fan,Yurong Qian
DOI: https://doi.org/10.1007/s10489-022-03217-9
IF: 5.3
2022-04-04
Applied Intelligence
Abstract:With the development of high-resolution camera technology, the shooting scene coverage has reached the square kilometer level, thousands of people can be observed at the same time, and the faces of people from a hundred meters away are clearly recognizable. The images captured by high-resolution cameras are very different from those captured by conventional cameras. In the face of many detection targets in high-resolution images, large differences in target scales due to spatial position, as well as difficulties in extracting features and poor detection results caused by target overlap and concealment phenomena, this paper proposes a multi-target detection method SARNet that combined with spatial attention optimization feature extraction. Use spatial attention to optimize the backbone network, expand the local receptive field, thereby enhance the representation ability, and enhance the feature extraction ability of small targets; the different scale features of the dilated feature pyramid network are subjected to the deformable region of interest pooling operation, which effectively improves the different scales detection accuracy. The experimental results show that the method proposed in this paper can get 51.9% mAP on the PANDA dataset, which is superior to the existing detection algorithms. At the same time, experimental verification of pedestrians and vehicles on the COCO2017 dataset fully proves the feasibility of the method in this paper.
computer science, artificial intelligence
What problem does this paper attempt to address?