Abstract:Abstract Object detection is an important component of computer vision. Most of the recent successful object detection methods are based on convolutional neural networks (CNNs). To improve the performance of these networks, researchers have designed many different architectures. They found that the CNN performance benefits from carefully increasing the depth and width of their structures with respect to the spatial dimension. Some researchers have exploited the cardinality dimension. Others have found that skip and dense connections were also of benefit to performance. Recently, attention mechanisms on the channel dimension have gained popularity with researchers. Global average pooling is used in SENet to generate the input feature vector of the channel-wise attention unit. In this work, we argue that channel-wise attention can benefit from both global average pooling and global max pooling. We designed three novel attention units, namely, an adaptive channel-wise attention unit, an adaptive spatial-wise attention unit and an adaptive domain attention unit, to improve the performance of a CNN. Instead of concatenating the output of the two attention vectors generated by the two channel-wise attention sub-units, we weight the two attention vectors based on the output data of the two channel-wise attention sub-units. We integrated the proposed mechanism with the YOLOv3 and MobileNetv2 framework and tested the proposed network on the KITTI and Pascal VOC datasets. The experimental results show that YOLOv3 with the proposed attention mechanism outperforms the original YOLOv3 by mAP values of 2.9 and 1.2% on the KITTI and Pascal VOC datasets, respectively. MobileNetv2 with the proposed attention mechanism outperforms the original MobileNetv2 by a mAP value of 1.7% on the Pascal VOC dataset.

What problem does this paper attempt to address?

The paper primarily focuses on improving convolutional neural network (CNN)-based object detection methods by introducing a new adaptive attention mechanism to enhance detection performance. ### Research Background and Objectives - **Importance of Object Detection**: Object detection is a fundamental task in computer vision, crucial for other computer vision tasks such as object tracking and image segmentation. - **Current Technical Challenges**: Although convolutional neural networks (CNNs) have made significant progress in automatically learning feature representations from images, further improving their performance remains an important research direction. - **Research Objectives**: To design a new attention mechanism that can utilize both global and local information, selectively emphasize useful features, and suppress unimportant features. ### Main Contributions 1. **Proposed Three New Adaptive Attention Units**: Including adaptive channel attention unit, adaptive spatial attention unit, and adaptive domain attention unit. 2. **Fully Data-Driven Attention Mechanism**: The proposed attention mechanism is entirely data-driven, lightweight, and easy to apply. 3. **Experimental Validation**: The adaptive attention mechanism was applied to YOLOv3 and MobileNetv2 frameworks and tested on the KITTI and Pascal VOC datasets. The results show a significant improvement in mean Average Precision (mAP) for the enhanced models. ### Technical Details - **Adaptive Channel Attention Unit**: Combines the advantages of global max pooling and global average pooling, with the adaptive domain attention unit weighting these two types of attention. - **Adaptive Domain Attention Unit**: Dynamically adjusts the weights of different attention branches based on the characteristics of the input data. - **Spatial Attention Unit**: Enhances the network's focus on spatial locations, especially in the lower layers of the network, which contain rich positional information but less semantic information. ### Experimental Results - On the KITTI dataset, the model with the adaptive attention mechanism improved the mean Average Precision by 2.9% compared to the original YOLOv3. - On the Pascal VOC dataset, the enhanced model improved the mean Average Precision by 1.2% compared to the original YOLOv3, and by 1.7% compared to the original MobileNetv2. In summary, the paper effectively enhances the performance of CNN-based object detection systems by introducing an adaptive attention mechanism.

Object detection based on an adaptive attention mechanism

An Image Object Detection Model Based on Mixed Attention Mechanism Optimized YOLOv5

An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images

Local Attention Sequence Model for Video Object Detection

Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration

Multi-scale coupled attention for visual object detection

Small object detection based on attention mechanism and enhanced network

Multibranch Attention Mechanism Based on Channel and Spatial Attention Fusion

Attention CoupleNet: Fully Convolutional Attention Coupling Network for Object Detection

YOLO V4 with hybrid dilated convolution attention module for object detection in the aerial dataset

DAF-Net: dense attention feature pyramid network for multiscale object detection

Lightweight Spatial Sliced-Concatenate-Multireceptive-Field Enhance and Joint Channel Attention Mechanism for Infrared Object Detection

Research on 3D Object Detection Method Based on Convolutional Attention Mechanism

Attention Mechanism and Detection Box Information Based Real-time Multi-Object Vehicle Detection

Adaptively Attentional Feature Fusion Oriented to Multiscale Object Detection in Remote Sensing Images

SA-YOLOv3: An Efficient and Accurate Object Detector Using Self-Attention Mechanism for Autonomous Driving

HAR-Net: Joint Learning of Hybrid Attention for Single-Stage Object Detection

Adaptive Attention Module for Image Recognition Systems in Autonomous Driving

Adaptive Anchor Box Mechanism to Improve the Accuracy in the Object Detection System

Few-Shot Object Detection Based on Adaptive Attention Mechanism and Large-Margin Softmax

L4Net: an Anchor‐free Generic Object Detector with Attention Mechanism for Autonomous Driving