Abstract:UAV multitarget detection plays a pivotal role in civil and military fields. Although deep learning methods provide a more effective solution to this task, changes in target size, shape change, occlusion, and lighting conditions from the perspective of drones still bring great challenges to research in this field. Based on the above problems, this paper proposes an aerial image detection model with excellent performance and strong robustness. First, in view of the common problem that small targets in aerial images are prone to misdetection and missed detection, the idea of Bi-PAN-FPN is introduced to improve the neck part in YOLOv8-s. By fully considering and reusing multiscale features, a more advanced and complete feature fusion process is achieved while maintaining the parameter cost as much as possible. Second, the GhostblockV2 structure is used in the backbone of the benchmark model to replace part of the C2f module, which suppresses information loss during long-distance feature transmission while significantly reducing the number of model parameters; finally, WiseIoU loss is used as bounding box regression loss, combined with a dynamic nonmonotonic focusing mechanism, and the quality of anchor boxes is evaluated by using “outlier” so that the detector takes into account different quality anchor boxes to improve the overall performance of the detection task. The algorithm’s performance is compared and evaluated on the VisDrone2019 dataset, which is widely used worldwide, and a detailed ablation experiment, contrast experiment, interpretability experiment, and self-built dataset experiment are designed to verify the effectiveness and feasibility of the proposed model. The results show that the proposed aerial image detection model has achieved obvious results and advantages in various experiments, which provides a new idea for the deployment of deep learning in the field of UAV multitarget detection.

Model-guided Multi-path Knowledge Aggregation for Aerial Saliency Prediction

How Drones Look: Crowdsourced Knowledge Transfer for Aerial Video Saliency Prediction.

Aerial-PASS: Panoramic Annular Scene Segmentation in Drone Videos

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Video saliency prediction for first-person view UAV videos: Dataset and benchmark

Three-Dimensional Drone Exploration with Saliency Prediction in Real Unknown Environments

Video Saliency Prediction Using Enhanced Spatiotemporal Alignment Network

YOLO-U: multi-task model for vehicle detection and road segmentation in UAV aerial imagery

Drone Detection Method Based on MobileViT and CA-PANet

Multi-scale Feature Extraction and Fusion Net: Research on UAVs Image Semantic Segmentation Technology

Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition

Revisiting Video Saliency Prediction in the Deep Learning Era

Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

Hybrid Attention Spatial-Temporal Network for Video Saliency Prediction

Multi-Branch Parallel Networks for Object Detection in High-Resolution UAV Remote Sensing Images

A Modified YOLOv8 Detection Network for UAV Aerial Image Recognition

Transformer-based Multi-scale Feature Integration Network for Video Saliency Prediction

Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video

TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes

SDAPNet: End-to-End Multi-task Simultaneous Detection and Prediction Network.