Abstract:This work presents advancements in multi-class vehicle detection using UAV cameras through the development of spatiotemporal object detection models. The study introduces a Spatio-Temporal Vehicle Detection Dataset (STVD) containing 6, 600 annotated sequential frame images captured by UAVs, enabling comprehensive training and evaluation of algorithms for holistic spatiotemporal perception. A YOLO-based object detection algorithm is enhanced to incorporate temporal dynamics, resulting in improved performance over single frame models. The integration of attention mechanisms into spatiotemporal models is shown to further enhance performance. Experimental validation demonstrates significant progress, with the best spatiotemporal model exhibiting a 16.22% improvement over single frame models, while it is demonstrated that attention mechanisms hold the potential for additional performance gains.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to improve the multi - class vehicle detection performance of unmanned aerial vehicle (UAV) cameras in traffic monitoring by developing spatio - temporal object detection models. Specifically, the paper aims to overcome the challenges encountered by existing single - frame models when processing video data, such as occlusion, motion blur and illumination change problems, and use spatio - temporal information to improve detection accuracy and consistency. ### Problem Background Traditional object detection methods mainly focus on the processing of a single image. Although these methods have made significant progress in static image detection, when processing video data, due to the inability to effectively use the information in the time dimension, they perform poorly in dynamic scenes. For example, single - frame models are prone to occlusion problems in dense areas, it is difficult to maintain the consistency of detection results between frames, and can only access the limited features of a single frame. ### Solution To solve these problems, the paper proposes the following innovations: 1. **Constructing a Spatio - Temporal Object Detection Dataset (STVD)**: - Created a dataset containing 6,600 annotated continuous - frame images, covering three different types of vehicles (cars, trucks, buses), and obtained from aerial - view videos shot by UAVs on different road sections in the Republic of Cyprus. 2. **Improving the YOLOv5 Framework to Process Spatio - Temporal Data**: - Extended the YOLOv5 object detection framework through architecture enhancement and input representation changes, enabling it to process spatio - temporal data, thereby better capturing time dynamics. 3. **Introducing an Attention Mechanism**: - Integrated the attention mechanism into the spatio - temporal model, further improving the model performance. Experimental results show that the best spatio - temporal model improves the performance by 16.22% compared to the single - frame model, and the application of the attention mechanism also has the potential to bring more performance improvements. ### Experimental Verification Through a series of experiments, the author conducted quantitative and qualitative analyses of different models and examined the specific performance of each type of vehicle. The experimental results show that the spatio - temporal model has made significant progress in many aspects, especially in handling complex traffic scenes. ### Impact and Application This research not only improves the real - time detection ability of UAVs in traffic monitoring but also provides technical support for future intelligent transportation systems. Through more accurate traffic flow analysis, congestion point identification and accident detection, it can help optimize signal light control, replan traffic flow, and prevent secondary accidents. In the long run, these data can also provide valuable references for urban planners, helping them make wiser decisions on road expansion and infrastructure construction. In conclusion, by introducing spatio - temporal information and an attention mechanism, this paper solves the limitations of traditional single - frame models in video processing and provides new ideas and technical means for more efficient traffic monitoring.

Spatiotemporal Object Detection for Improved Aerial Vehicle Detection in Traffic Monitoring

Neighborhood physical disorder in New York City

Enhancing UAV Detection in Surveillance Camera Videos through Spatiotemporal Information and Optical Flow

Vehicle Target Detection Method for Wide-Area SAR Images Based on Coarse-Grained Judgment and Fine-Grained Detection

DAGN: A Real-Time UAV Remote Sensing Image Vehicle Detection Framework

Adaptive Feature Fusion and Improved Attention Mechanism-Based Small Object Detection for UAV Target Tracking

Spatio-Temporal Processing for Automatic Vehicle Detection in Wide-Area Aerial Video

Small object detection based on YOLOv8 in UAV perspective

UAV Target Detection Algorithm Based on Improved YOLOv8

Object Detection from UAV Thermal Infrared Images and Videos Using YOLO Models.

Enhancing Sustainable Traffic Monitoring: Leveraging NanoSight–YOLO for Precision Detection of Micro-Vehicle Targets in Satellite Imagery

A Novel Network Framework on Simultaneous Road Segmentation and Vehicle Detection for UAV Aerial Traffic Images

Real Time Human Detection by Unmanned Aerial Vehicles

UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios

Drone-TOOD: A Lightweight Task-Aligned Object Detection Algorithm for Vehicle Detection in UAV Images

YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection

A Modified YOLOv5 for Object Detection in UAV-captured Scenarios

VAMYOLOX: an Accurate and Efficient Object Detection Algorithm Based on Visual Attention Mechanism for UAV Optical Sensors

PVswin-YOLOv8s: UAV-Based Pedestrian and Vehicle Detection for Traffic Management in Smart Cities Using Improved YOLOv8