Abstract:Vehicle detection with optical remote sensing images has become widely applied in recent years. However, the following challenges have remained unsolved during remote sensing vehicle target detection. These challenges include the dense and arbitrary angles at which vehicles are distributed and which make it difficult to detect them; the extensive model parameter (Param) that blocks real-time detection; the large differences between larger vehicles in terms of their features, which lead to a reduced detection precision; and the way in which the distribution in vehicle datasets is unbalanced and thus not conducive to training. First, this paper constructs a small dataset of vehicles, MiVehicle. This dataset includes 3000 corresponding infrared and visible image pairs, offering a more balanced distribution. In the infrared part of the dataset, the proportions of different vehicle types are as follows: cars, 48%; buses, 19%; trucks, 15%; freight, cars 10%; and vans, 8%. Second, we choose the rotated box mechanism for detection with the model and we build a new vehicle detector, ML-Det, with a novel multi-scale feature fusion triple cross-criss FPN (TCFPN), which can effectively capture the vehicle features in three different positions with an mAP improvement of 1.97%. Moreover, we propose LKC–INVO, which allows involution to couple the structure of multiple large kernel convolutions, resulting in an mAP increase of 2.86%. We also introduce a novel C2F_ContextGuided module with global context perception, which enhances the perception ability of the model in the global scope and minimizes model Params. Eventually, we propose an assemble–disperse attention module to aggregate local features so as to improve the performance. Overall, ML-Det achieved a 3.22% improvement in accuracy while keeping Params almost unchanged. In the self-built small MiVehicle dataset, we achieved 70.44% on visible images and 79.12% on infrared images with 20.1 GFLOPS, 78.8 FPS, and 7.91 M. Additionally, we trained and tested our model on the following public datasets: UAS-AOD and DOTA. ML-Det was found to be ahead of many other advanced target detection algorithms.

Video Object Detection for Autonomous Driving: Motion-aid Feature Calibration

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Fully Motion-Aware Network for Video Object Detection

Feature Aligned Recurrent Network For Causal Video Object Detection

An Online Calibration Method for Robust Multi-Modality 3D Object Detection

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

Multi-View Adaptive Fusion Network for 3D Object Detection

AMFF-Net: An Effective 3D Object Detector Based on Attention and Multi-Scale Feature Fusion

Boost Correlation Features with 3D-MiIoU-Based Camera-LiDAR Fusion for MODT in Autonomous Driving

Multi-view 3D Object Detection Network for Autonomous Driving

MAFF-Net: Filter False Positive for 3D Vehicle Detection with Multi-modal Adaptive Feature Fusion

Joint Multi-Object Detection and Tracking with Camera-LiDAR Fusion for Autonomous Driving

Deep multi-scale and multi-modal fusion for 3D object detection

Channelwise and Spatially Guided Multimodal Feature Fusion Network for 3-D Object Detection in Autonomous Vehicles

Multimodal Fusion Object Detection System for Autonomous Vehicles

A re-calibration method for object detection with multi-modal alignment bias in autonomous driving

A Novel Multi-Sensor Fusion Based Object Detection and Recognition Algorithm for Intelligent Assisted Driving

Enhancing Object Detection for Autonomous Driving by Optimizing Anchor Generation and Addressing Class Imbalance

Learnable fusion mechanisms for multimodal object detection in autonomous vehicles

A Multi-Scale Feature Fusion Based Lightweight Vehicle Target Detection Network on Aerial Optical Images