Abstract:With the rapid growth in demand for security surveillance, assisted driving, and remote sensing, object detection networks with robust environmental perception and high detection accuracy have become a research focus. However, single-modality image detection technologies face limitations in environmental adaptability, often affected by factors such as lighting conditions, fog, rain, and obstacles like vegetation, leading to information loss and reduced detection accuracy. We propose an object detection network that integrates features from visible light and infrared images—IV-YOLO—to address these challenges. This network is based on YOLOv8 (You Only Look Once v8) and employs a dual-branch fusion structure that leverages the complementary features of infrared and visible light images for target detection. We designed a Bidirectional Pyramid Feature Fusion structure (Bi-Fusion) to effectively integrate multimodal features, reducing errors from feature redundancy and extracting fine-grained features for small object detection. Additionally, we developed a Shuffle-SPP structure that combines channel and spatial attention to enhance the focus on deep features and extract richer information through upsampling. Regarding model optimization, we designed a loss function tailored for multi-scale object detection, accelerating the convergence speed of the network during training. Compared with the current state-of-the-art Dual-YOLO model, IV-YOLO achieves mAP improvements of 2.8%, 1.1%, and 2.2% on the Drone Vehicle, FLIR, and KAIST datasets, respectively. On the Drone Vehicle and FLIR datasets, IV-YOLO has a parameter count of 4.31 M and achieves a frame rate of 203.2 fps, significantly outperforming YOLOv8n (5.92 M parameters, 188.6 fps on the Drone Vehicle dataset) and YOLO-FIR (7.1 M parameters, 83.3 fps on the FLIR dataset), which had previously achieved the best performance on these datasets. This demonstrates that IV-YOLO achieves higher real-time detection performance while maintaining lower parameter complexity, making it highly promising for applications in autonomous driving, public safety, and beyond.

Dual-branch network object detection algorithm based on dual-modality fusion of visible and infrared images

An object detection algorithm based on infrared-visible dual modal feature fusion

Dual-Branch Feature Fusion Network for Salient Object Detection

Dual-YOLO Architecture from Infrared and Visible Images for Object Detection

A Lightweight SE-YOLOv3 Network for Multi-Scale Object Detection in Remote Sensing Imagery.

Multispectral Object Detection Based on Multilevel Feature Fusion and Dual Feature Modulation

IV-YOLO: A Lightweight Dual-Branch Object Detection Network

YOLO-CIR: The network based on YOLO and ConvNeXt for infrared object detection

Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network

DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection

Target Recognition Based on Infrared and Visible Image Fusion and Improved YOLOv8 Algorithm

Lightweight Spatial Sliced-Concatenate-Multireceptive-Field Enhance and Joint Channel Attention Mechanism for Infrared Object Detection

Object Detection in Multispectral Remote Sensing Images Based on Cross-Modal Cross-Attention

MMYFnet: Multi-Modality YOLO Fusion Network for Object Detection in Remote Sensing Images

Infrared Dim and Small Target Detection Based on Local–Global Feature Fusion

ACDF-YOLO: Attentive and Cross-Differential Fusion Network for Multimodal Remote Sensing Object Detection

YOLOFIV: Object Detection Algorithm for Around-the-Clock Aerial Remote Sensing Images by Fusing Infrared and Visible Features

Object Detection by Channel and Spatial Exchange for Multimodal Remote Sensing Imagery

An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection

DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion.

Object Detection for Remote Sensing Based on the Enhanced YOLOv8 With WBiFPN