Abstract:Compared with two-stage object detection algorithms, one-stage algorithms provide a better trade-off between real-time performance and accuracy. However, these methods treat the intermediate features equally, which lacks the flexibility to emphasize meaningful information for classification and location. Besides, they ignore the interaction of contextual information from different scales, which is important for medium and small objects detection. To tackle these problems, we propose an image pyramid network based on dual attention mechanism (DAIPNet), which builds an image pyramid to enrich the spatial information while emphasizing multi-scale informative features based on dual attention mechanisms for one stage object detection. Our framework utilizes a pre-trained backbone as standard detection network, where the designed image pyramid network (IPN) is used as auxiliary network to provide complementary information. Here, the dual attention mechanism is composed of the adaptive feature fusion module (AFFM) and the progressive attention fusion module (PAFM). AFFM is designed to automatically pay attention to the feature maps with different importance from the backbone and auxiliary network, while PAFM is utilized to adaptively learn the channel attentive information in the context transfer process. Furthermore, in the IPN, we build an image pyramid to extract scale-wise features from downsampled images of different scales, where the features are further fused at different states to enrich scale-wise information and learn more comprehensive feature representations. Experimental results are shown on MS COCO dataset. Our proposed detector with a 300 x 300 input achieves superior performance of 32.6% mAP on the MS COCO test-dev compared with state-of-the-art methods.

A Transformer-based Dual Position Attention Network for Recognizing Human-object Interaction

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

Human–object interaction detection based on disentangled axial attention transformer

Pairwise CNN-Transformer Features for Human–Object Interaction Detection

Human-object interaction detection based on cascade multi-scale transformer

EBiDA-FPN: Enhanced Bi-Directional Attention Feature Pyramid Network for Object Detection

Parallel disentangling network for human–object interaction detection

A Novel Part Refinement Tandem Transformer for Human-Object Interaction Detection

Category-Aware Transformer Network for Better Human-Object Interaction Detection

Mask-Guided Transformer for Human-Object Interaction Detection

Focus and Adjust: Progressive Refinement Network for Human Object Interaction Detection

Multi-Scale Human-Object Interaction Detector.

Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

Dual Attention Based Image Pyramid Network for Object Detection.

Dual Position Relationship Transformer for Image Captioning.

DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation

DTSSD: Dual-Channel Transformer-Based Network for Point-Based 3D Object Detection

DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection

Reasoning About Human-Object Interactions Through Dual Attention Networks

Human-Object Interaction Detection via Disentangled Transformer