Abstract:Vehicle detection with optical remote sensing images has become widely applied in recent years. However, the following challenges have remained unsolved during remote sensing vehicle target detection. These challenges include the dense and arbitrary angles at which vehicles are distributed and which make it difficult to detect them; the extensive model parameter (Param) that blocks real-time detection; the large differences between larger vehicles in terms of their features, which lead to a reduced detection precision; and the way in which the distribution in vehicle datasets is unbalanced and thus not conducive to training. First, this paper constructs a small dataset of vehicles, MiVehicle. This dataset includes 3000 corresponding infrared and visible image pairs, offering a more balanced distribution. In the infrared part of the dataset, the proportions of different vehicle types are as follows: cars, 48%; buses, 19%; trucks, 15%; freight, cars 10%; and vans, 8%. Second, we choose the rotated box mechanism for detection with the model and we build a new vehicle detector, ML-Det, with a novel multi-scale feature fusion triple cross-criss FPN (TCFPN), which can effectively capture the vehicle features in three different positions with an mAP improvement of 1.97%. Moreover, we propose LKC–INVO, which allows involution to couple the structure of multiple large kernel convolutions, resulting in an mAP increase of 2.86%. We also introduce a novel C2F_ContextGuided module with global context perception, which enhances the perception ability of the model in the global scope and minimizes model Params. Eventually, we propose an assemble–disperse attention module to aggregate local features so as to improve the performance. Overall, ML-Det achieved a 3.22% improvement in accuracy while keeping Params almost unchanged. In the self-built small MiVehicle dataset, we achieved 70.44% on visible images and 79.12% on infrared images with 20.1 GFLOPS, 78.8 FPS, and 7.91 M. Additionally, we trained and tested our model on the following public datasets: UAS-AOD and DOTA. ML-Det was found to be ahead of many other advanced target detection algorithms.

Vehicle Detection with Bottom Enhanced RetinaNet in Aerial Images

Road Traffic Sign Detection Method Based on RTS R-CNN Instance Segmentation Network

A-RetinaNet: A novel RetinaNet with an asymmetric attention fusion mechanism for dim and small drone detection in infrared images

AEGLR-Net: Attention Enhanced Global-Local Refined Network for Accurate Detection of Car Body Surface Defects

DAR-Net: Dense Attentional Residual Network for Vehicle Detection in Aerial Images

Target-Guided Feature Super-Resolution for Vehicle Detection in Remote Sensing Images

An Improved FBPN-Based Detection Network for Vehicles in Aerial Images.

A Multi-Scale Feature Fusion Based Lightweight Vehicle Target Detection Network on Aerial Optical Images

Vehicle Detection in UAV Images via Background Suppression Pyramid Network and Multi-Scale Task Adaptive Decoupled Head

LR-CNN: Local-aware Region CNN for Vehicle Detection in Aerial Imagery

Application of an Improved Focal Loss in Vehicle Detection.

DTNet: A Specialized Dual-Tuning Network for Infrared Vehicle Detection in Aerial Images

SCAF-Net: Scene Context Attention-Based Fusion Network for Vehicle Detection in Aerial Imagery

Anchor Generation Optimization and Region of Interest Assignment for Vehicle Detection

Real-Time Vehicle Detection from UAV Aerial Images Based on Improved YOLOv5

Vehicle Detection in Multisource Remote Sensing Images Based on Edge-Preserving Super-Resolution Reconstruction

A Vehicle Detection Method Based on an Improved U-YOLO Network for High-Resolution Remote-Sensing Images

A Feature Fusion Deep-Projection Convolution Neural Network for Vehicle Detection in Aerial Images

Adaptive Local Context Embedding for Small Vehicle Detection from Aerial Optical Remote Sensing Images

SR-Net: Saliency Region Representation Network for Vehicle Detection in Remote Sensing Images

Tiny Vehicle Detection for Mid-to-High Altitude UAV Images Based on Visual Attention and Spatial-Temporal Information