Cascaded feature fusion with multi-level self-attention mechanism for object detection

Chuanxu Wang,Huiru Wang
DOI: https://doi.org/10.1016/j.patcog.2023.109377
IF: 8
2023-02-11
Pattern Recognition
Abstract:Object detection has been a challenging task due to the complexity and diversity of objects. The emergence of self-attention mechanism provides a new clue for feature fusion in object detection task. Most existing self-attention mechanisms focus on extracting the correlation between global and local information in space or among channels, however it remains problematic issues of how to effectively fuse all those features. To address the above problems, we propose a Pooling and Global feature Fusion Self-attention Mechanism (PGFSM) to capture multi-level correlations among a variety of features, so as to perform cascaded aggregations upon them. PGFSM consists of three parts: Spatial Self-attention Pooling Fusion Module (SSPFM), Channel Self-attention Pooling Fusion Module (CSPFM), and Spatial and Channel Global Self-attention Fusion Module (SCGSFM). SSPFM and CSPFM respectively carried out in space and channel, extract the global maximum pooling and global average pooling self-attention features; SCGSFM extracts the spatial and channel fused characteristic relationship in the global. Finally, the three fused feature relations are added on the original feature to achieve an enhanced trait representation. In test, our PGFSM is embedded into YOLOv4, YOLOv5, and EfficientDet network respectively, and evaluated in PASCAL VOC and MS COCO datasets. The experiment results show that the feature fusion self-attention mechanism improves the performance of object detection compared to each original framework and also the state-of-the-art modules, which proves the effectiveness of our method.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?