A dynamic attention mechanism for object detection in road or strip environments

Zhang, Guowei,Wang, Li
DOI: https://doi.org/10.1007/s00371-024-03653-3
IF: 2.835
2024-09-26
The Visual Computer
Abstract:The key to road target recognition is how to ensure high speed and efficiency. Deformable DETR introduces a deformable attention module based on DETR to achieve an efficient and fast object detection mechanism. However, effectively perceiving the complex structured objects and small targets in road environment remains a key challenge in visual recognition. To meet these requirements, this article proposes a new attention mechanism, DDA (dynamic deformable attention), built on Deformable DETR. DDA is a scale-adaptive attention module, which can offset key sampling points toward the direction of target features based on different detection targets. When facing different detection targets, the attention mechanism can autonomously determine the appropriate receptive field. DDA improves granularity perception by 8 times compared to Deformable DETR and enhances perception ability of target contours and small targets in complex environments. Numerous experiments have demonstrated its effectiveness on the COCO2017, WIDER Person Challenge and DAWN dataset benchmark. Additionally, the attention mechanism can more effectively recognize small targets. Compared to similar models, DDA has increased its detection metric (APS) for small targets by 1.2 . Code is available at: https://github.com/Cigol1997/DDANet.
computer science, software engineering
What problem does this paper attempt to address?