Abstract:Abstract Object detection is an important component of computer vision. Most of the recent successful object detection methods are based on convolutional neural networks (CNNs). To improve the performance of these networks, researchers have designed many different architectures. They found that the CNN performance benefits from carefully increasing the depth and width of their structures with respect to the spatial dimension. Some researchers have exploited the cardinality dimension. Others have found that skip and dense connections were also of benefit to performance. Recently, attention mechanisms on the channel dimension have gained popularity with researchers. Global average pooling is used in SENet to generate the input feature vector of the channel-wise attention unit. In this work, we argue that channel-wise attention can benefit from both global average pooling and global max pooling. We designed three novel attention units, namely, an adaptive channel-wise attention unit, an adaptive spatial-wise attention unit and an adaptive domain attention unit, to improve the performance of a CNN. Instead of concatenating the output of the two attention vectors generated by the two channel-wise attention sub-units, we weight the two attention vectors based on the output data of the two channel-wise attention sub-units. We integrated the proposed mechanism with the YOLOv3 and MobileNetv2 framework and tested the proposed network on the KITTI and Pascal VOC datasets. The experimental results show that YOLOv3 with the proposed attention mechanism outperforms the original YOLOv3 by mAP values of 2.9 and 1.2% on the KITTI and Pascal VOC datasets, respectively. MobileNetv2 with the proposed attention mechanism outperforms the original MobileNetv2 by a mAP value of 1.7% on the Pascal VOC dataset.

Cascaded feature fusion with multi-level self-attention mechanism for object detection

Attention-based Fusion Factor in FPN for Object Detection

FFR-SSD: feature fusion and reconstruction single shot detector for multi-scale object detection

ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection

An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images

Joint-attention feature fusion network and dual-adaptive NMS for object detection

Pyramid attention object detection network with multi-scale feature fusion

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Adaptively Attentional Feature Fusion Oriented to Multiscale Object Detection in Remote Sensing Images

Spatial attention-guided deformable fusion network for salient object detection

GHAFNet: Global-context hierarchical attention fusion method for traffic object detection

Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network

MFIL-FCOS: A Multi-Scale Fusion and Interactive Learning Method for 2D Object Detection and Remote Sensing Image Detection

Multi-scale coupled attention for visual object detection

Object Detection With Extended Attention And Spatial Information

Object detection based on an adaptive attention mechanism

Background-Aware Cross-Attention Multiscale Fusion for Multispectral Object Detection

Research on Image Semantic Segmentation Based on Hybrid Cascade Feature Fusion and Detailed Attention Mechanism

DMFF: dual-way multimodal feature fusion for 3D object detection

PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection

A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network