PVswin-YOLOv8s: UAV-Based Pedestrian and Vehicle Detection for Traffic Management in Smart Cities Using Improved YOLOv8

Noor Ul Ain Tahir,Zhe Long,Zuping Zhang,Muhammad Asim,Mohammed ELAffendi
DOI: https://doi.org/10.3390/drones8030084
IF: 5.532
2024-02-28
Drones
Abstract:In smart cities, effective traffic congestion management hinges on adept pedestrian and vehicle detection. Unmanned Aerial Vehicles (UAVs) offer a solution with mobility, cost-effectiveness, and a wide field of view, and yet, optimizing recognition models is crucial to surmounting challenges posed by small and occluded objects. To address these issues, we utilize the YOLOv8s model and a Swin Transformer block and introduce the PVswin-YOLOv8s model for pedestrian and vehicle detection based on UAVs. Firstly, the backbone network of YOLOv8s incorporates the Swin Transformer model for global feature extraction for small object detection. Secondly, to address the challenge of missed detections, we opt to integrate the CBAM into the neck of the YOLOv8. Both the channel and the spatial attention modules are used in this addition because of how well they extract feature information flow across the network. Finally, we employ Soft-NMS to improve the accuracy of pedestrian and vehicle detection in occlusion situations. Soft-NMS increases performance and manages overlapped boundary boxes well. The proposed network reduced the fraction of small objects overlooked and enhanced model detection performance. Performance comparisons with different YOLO versions ( for example YOLOv3 extremely small, YOLOv5, YOLOv6, and YOLOv7), YOLOv8 variants (YOLOv8n, YOLOv8s, YOLOv8m, and YOLOv8l), and classical object detectors (Faster-RCNN, Cascade R-CNN, RetinaNet, and CenterNet) were used to validate the superiority of the proposed PVswin-YOLOv8s model. The efficiency of the PVswin-YOLOv8s model was confirmed by the experimental findings, which showed a 4.8% increase in average detection accuracy (mAP) compared to YOLOv8s on the VisDrone2019 dataset.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to use unmanned aerial vehicles (UAVs) for pedestrian and vehicle detection in smart cities to manage traffic congestion. Specifically, the paper focuses on the following aspects: 1. **Small - target detection**: In the images taken by UAVs, pedestrians and vehicles are often very small targets, which pose challenges to detection algorithms. Traditional detection algorithms often perform poorly when dealing with these small targets. 2. **Occlusion problem**: In complex traffic environments, pedestrians and vehicles are often partially occluded, which causes detection algorithms to be prone to missed detections or false detections. 3. **Background complexity**: The backgrounds of images taken by UAVs are usually very complex and contain a wide variety of elements, which increases the difficulty of the detection task. To address these challenges, the paper proposes a new model named PVswin - YOLOv8s. This model is based on YOLOv8s and integrates Swin Transformer and CBAM modules to improve the detection performance of small and occluded targets. Specific improvement measures include: - **Swin Transformer**: Introduce Swin Transformer blocks into the backbone network of YOLOv8s to enhance the global feature extraction ability, especially when dealing with small targets. - **CBAM**: Add CBAM modules to the neck network of YOLOv8s. Improve the feature information flow through channel - attention and spatial - attention mechanisms, thereby improving the detection accuracy. - **Soft - NMS**: Replace the traditional NMS method with Soft - NMS to better handle overlapping targets and improve the accuracy of detection. Through these improvements, the average detection precision (mAP) of the PVswin - YOLOv8s model on the VisDrone2019 dataset is 4.8% higher than that of YOLOv8s. This indicates that the model has significant advantages in solving the above - mentioned problems.