AD-YOLO: A Real-Time YOLO Network with Swin Transformer and Attention Mechanism for Airport Scene Detection

Wentao Zhou,Chengtao Cai,Chenming Li,Hao Xu,Haochen Shi
DOI: https://doi.org/10.1109/tim.2024.3472805
2024-01-01
Abstract:Real-time acquisition of airport scene information is crucial for airport safety and optimization of airport utilization efficiency. However, detecting airport objects is still a challenging task due to the small size of person and vehicle targets in the airport scene images, insufficient public airport data, and so on, which makes it difficult to achieve high accuracy and real-time detection methods in the airport scene simultaneously. This article proposes a novel airport object detection approach to address the challenge by integrating the advantages of improved you only look once (YOLO), Swin Transformer, and attention mechanism [airport detector-YOLO (AD-YOLO)]. Specifically, we introduce the Swin Transformer, which retains the Transformer's ability of global attention to obtain features and reduces the drawbacks of computational complexity, into the head network based on YOLOv7 to improve the high-dimensional information feature fusion. We also design an efficient channel spatial attention (ECSA) module and introduce a small object detection layer (SODL) to improve the detection accuracy of small targets in the airport scene. We test the proposed method on the self-constructed multiple airport surveillance dataset (MASD) containing 5736 images captured by actual airport and online airport video. The experimental results show that AD-YOLO achieves 71.6% mean average precision (mAP), exceeding the mAP of the baseline method by 4.4%. The proposed method has 101.4 frames/s (FPS) on the NVIDIA RTX3080 GPU and 17.8 FPS on the Jetson Orin NX, meeting the real-time and accuracy requirements of the airport scene. Finally, the experimental results on the public airport surface surveillance (ASS) dataset show that AD-YOLO outperforms other detection methods, demonstrating its effectiveness.
What problem does this paper attempt to address?