AParC-DETR: Accelerate DETR training by introducing Adaptive Position-aware Circular Convolution

Ya’nan Guan,Shujiao Liao,Wenyuan Yang
DOI: https://doi.org/10.1007/s00371-024-03422-2
IF: 2.835
2024-05-08
The Visual Computer
Abstract:Detection Transformer (DETR) is a more concise detection paradigm that eliminates artificial designs and interventions. However, it is difficult for previous DETR models to obtain local sensitive locations when processing images, which leads to slow convergence during training. In this article, we introduce the Adaptive Position-Aware Circular Convolution DEtection TRansformer (AParC-DETR), which has a global receptive field and can perceive sensitive local features, improving the model's adaptability while limiting the increase in computation. Groups of particles are sampled from the 3D space to encode the content vector. The content vector and positional vector are updated through Multi-Head Self-Attention with Boundary Information. The channel and spatial features are mixed through Adaptive Position-Aware Circular Convolution Global Mixing to obtain the mixed feature matrix. An Adaptive Gating Channel Mixing (AGCM) mechanism with a gate control branch is incorporated to improve adaptability while limiting computational costs. Position-Aware Spatial Mixing (PASM) extends the receptive field to the global level with lower computational cost, using instance kernels and position embedding strategies. The category and bounding box decouple the output detection results to avoid the interference of mutual coupling. In object detection on MS COCO, AParC-DETR achieves an AP of 44.2 after 12 epochs of training, which improves to 45.8 after 36 epochs. Moreover, ablation studies are performed on Cityscapes dataset to analyze AParC-DETR's computational efficiency. With 119 G FLOPs and 136 M reference counts, AParC-DETR attains 23 FPS. The codes are upload on https://github.com/Guanyn/AParC-DETR.git.
computer science, software engineering
What problem does this paper attempt to address?