From Macro to Micro: Rethinking Multi-Scale Pedestrian Detection.
Yuzhe He,Ning He,Haigang Yu,Ren Zhang,Kang Yan
DOI: https://doi.org/10.1007/s00530-023-01058-1
IF: 3.9
2023-01-01
Multimedia Systems
Abstract:Pedestrian detection is the use of computer vision techniques to determine whether there are pedestrians in an image or video sequence and give their precise positioning, but the difference in the scale of pedestrians has always been a difficult problem in pedestrian detection. In contrast to existing research, this study jointly considers the problem of multi-scale pedestrian detection at both the macro- and micro-levels. At the macro-level, the shape and location of an anchor are predicted by feature maps to guide its generation, and the obtained anchor can better adapt to pedestrian targets at different scales. At the micro-level, the standard convolution in the backbone network is replaced with switchable atrous convolution, which effectively solves the problem of scale differences between pedestrians. Finally, the classification and regression tasks in pedestrian detection are completed more efficiently through the use of a Double Head. These elements are combined to form a multi-scale pedestrian detection network, and experimental results show that the model proposed in this paper can substantially improve the performance of multi-scale pedestrian detection. The detection accuracy on the COCOPersons dataset reaches an average precision (AP) of 57.3. Compared with the pedestrian detection accuracy of Faster R-CNN based on a feature pyramid network at large, medium, and small scales, the accuracy of our model is significantly improved at 1.7 AP, 2.5 AP, and 6.8 AP, respectively. On the Caltech pedestrian dataset, the MR^2 of Near, Medium and Far subsets reach 0.45 MR^2 of Small, Medium and Large subsets reach 12.1