Context feature fusion and enhanced non-maximum suppression for pedestrian detection in crowded scenes
Yu Shao,Jianhua Hu,Lihua Hu,Jifu Zhang,Xinbo Wang
DOI: https://doi.org/10.1007/s11042-024-18865-x
IF: 2.577
2024-03-17
Multimedia Tools and Applications
Abstract:Pedestrian detection has a wide range of applications in the field of multimedia, and significant progress has been made. However, in densely populated scenes, there are two problems: occlusion and mistake suppression of overlapping bounding boxes, which lead to false positives and false negatives, thereby degrading overall performance. To tackle these problems, firstly, by leveraging contextual information to capture correlations between pedestrians and backgrounds, we propose the Context Feature Fusion Module (CFFM), which alleviates the absence of crucial features caused by occlusion. Secondly, by combining the intersection over Union (IoU) and the distance between center points of overlapping bounding boxes, we propose Distance Set Non-Maximization Suppression (DSNMS), which tackles error suppression of overlapping bounding boxes. Finally, extensive experiments were conducted on the CrowdHuman dataset, yielding remarkable results for our method with an Average Precision (AP) of 91.22%, a Log average miss rate (MR ) of 40.26%, and a Jaccard Index (JI) of 83.54%. Furthermore, the visualization results of real-world scenes further validate the efficacy of our proposed method.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering