ABC-Trans: a novel adaptive border-augmented cross-attention transformer for object detection

Qianjun Zhang,Pan Wang,Zihao Wu,Binhong Yang,Jin Yuan
DOI: https://doi.org/10.1007/s11042-024-19405-3
IF: 2.577
2024-06-23
Multimedia Tools and Applications
Abstract:Transformer-based vision object detection has demonstrated superior performance due to its effective removal of the need for many hand-designed components like anchor generation or a non-maximum suppression procedure. This paper presents a novel Adaptive Border-augmented Cross-attention Transformer (ABC-Trans) for vision-based object detection. By integrating the classic DETR and Deformable DETR, we design an adaptive cross-attention module to simultaneously perform global and local point sampling strategies to generate fusion features according to an estimated weight, well-capturing representative features for both small and large objects. On this basis, we further introduce a border-augmented cross-attention module to incorporate notable border features for object detection. Border features could well represent objects as well as distinguish from them backgrounds, thereby helping our model to accurately predict objects. Extensive experiments are conducted on MSCOCO and Pascal VOC datasets, and the results demonstrate the effectiveness of the proposed components, achieving promising performance as compared to the classic transformer-based detection approaches.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?