Cross stage partial connections based weighted Bi-directional feature pyramid and enhanced spatial transformation network for robust object detection

Yan-Feng Lu,Qian Yu,Jing-Wen Gao,Yi Li,Jun-Cheng Zou,Hong Qiao
DOI: https://doi.org/10.1016/j.neucom.2022.09.117
IF: 6
2022-11-07
Neurocomputing
Abstract:Structural information is an essential component for efficient object detection. In many visual detection tasks, the objects with large structural deformation usually make up a large proportion. The shape, contour, and internal structure of the objects tend toward dramatic change, which easily causes troubles for efficient object detection. Therefore, how to detect these objects robustly and accurately is one of the significant challenges. To address this issue, we introduce a Cross Stage Partial connections-based weighted Bi-directional Feature Pyramid Network (CSP-BiFPN), which allows easy and efficient multi-scale feature fusion by cross-stage partial connections. Second, to enhance the model's spatial transformation capacity, the multi-scale feature maps extracted from the YOLO backbone network are processed by an enhanced spatial transformation network (ESTN) for spatial deformations. Based on these architectural modifications and optimizations, we further develop a novel real-time robust object detection model called Bi-STN-YOLO. We evaluate the performance of the proposed method on four image datasets. The experimental results demonstrate that the proposed approach achieves significant improvements compared with the typical YOLO families and competitive performance compared to the state-of-the-arts in detection tasks.
computer science, artificial intelligence
What problem does this paper attempt to address?