Abstract:For channel and spatial feature map C×W×H in object detection task, its information fusion usually relies on attention mechanism, that is, all C channels and the entire space W×H are all compressed respectively via average/max pooling, and then their attention weight masks are obtained based on correlation calculation. This coarse-grained global operation ignores the differences among multiple channels and diverse spatial regions, resulting in inaccurate attention weights. In addition, how to mine the contextual information in the space W×H is also a challenge for object recognition and localization. To this end, we propose a Fine-Grained Dual Level Attention Mechanism joint Spacial Context Information Fusion module for object detection (FGDLAM SCIF). It is a cascaded structure, firstly, we subdivide the feature space W×H into n (optimized as n = 4 in experiments) subspaces and construct a global adaptive pooling and one-dimensional convolution algorithm to effectively extract the feature channel weights on each subspace respectively. Secondly, the C feature channels are divided into n (n = 4) sub-channels, and then a multi-scale module is constructed in the feature space W×H to mine context information. Finally, row and column coding is used to fuse them orthogonally to obtain enhanced features. This module is embeddable, which can be transplanted into any object detection network, such as YOLOv4/v5, PPYOLOE, YOLOX and MobileNet, ResNet as well. Experiments are conducted on the MS COCO 2017 and Pascal VOC 2007 datasets to verify its effectiveness and good portability.

Feature refinement with multi-level context for object detection

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Feature Refinement from Multiple Perspectives for High Performance Salient Object Detection.

Multi-scale Fusion with Context-aware Network for Object Detection

Context Refinement for Object Detection

MC-Refine: Enhanced Cross-modal 3-D Object Detection Via Multi-stage Cross-scale Fusion and Box Refinement

Research of improving semantic image segmentation based on a feature fusion model

Spatial Attention for Multi-Scale Feature Refinement for Object Detection.

Multi-scale Feature and Spatial Relation Inference for Object Detection.

Improving Multiscale Object Detection With Off-Centered Semantics Refinement

Multi-scale Context Enhancement Network for Object Detection

Multi-branch feature fusion and refinement network for salient object detection

Multi-scale feature selection and fusion for object detection

Adaptive Multilevel Fusion Refinement Network for Object Detection in Remote Sensing Images

Multilevel feature fusion dilated convolutional network for semantic segmentation

Enriched Feature Guided Refinement Network for Object Detection

Feature Enhancement for Multi-scale Object Detection.

Multi-scale iterative refinement network for RGB-D salient object detection

Object Detection Using Deep Learning: Single Shot Detector with a Refined Feature-fusion Structure

Fine Grained Dual Level Attention Mechanisms with Spacial Context Information Fusion for Object Detection

Cross-modal and multi-level feature refinement network for RGB-D salient object detection