Abstract:The Visible-Infrared (VIS-IR) object detection is a challenging detection task, which combines visible and infrared data to give information on the category and location of objects in the scene. Therefore, the core of this task is to combine complementary information in the visible and infrared modalities to provide more object detection results for detection. The existing methods mainly face the problem of insufficient ability to perceive and combine visible-infrared modal information and have difficulty in balancing the optimization directions of the fusion and detection tasks. To solve these problem, we propose the MMI-Det which is a multi-modal fusion method for visible and infrared object detection. The method can provide a good combination of complementary information in the visible-infrared modalities and output accurate and robust object information. Specifically, to improve the ability of the model to perceive environment at the visible-infrared image level, we designed the Contour Enhancement Module. Furthermore, to extract complementary information from VIS and IR modalities, we design the Fusion Focus Module. It can extract different frequency spectral features of the visible and infrared modalities and focus on the key information of the object at different spatial locations. Moreover, we design the Contrast Bridge Module to improve the ability to extract modal invariant features in the visible-infrared scene. Finally, to ensure that our model can balance the optimization directions of image fusion and object detection, we design the Info Guided Module as a way to improve the effectiveness of the model’s training optimization. We implement extensive experiments on the public FLIR, M3FD, LLVIP, TNO and MSRS datasets, and compared with previous methods, our method achieves better performance with powerful multi-modal information perception capabilities.

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection

MCDet: Multi-Content Collaboration Detector for Multiscale Remote Sensing Object

MVM3Det: A Novel Method for Multi-view Monocular 3D Detection

Uni^2Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events

Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection

Shallow Multiplexing and Multiscale Dilation Convolution Combined Attention Based Oriented Object Detection in Remote Sensing Images

RemoteDet-Mamba: A Hybrid Mamba-CNN Network for Multi-modal Object Detection in Remote Sensing Images

DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing

Simple Multi-dataset Detection

Multiview Detection with Feature Perspective Transformation

Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

MMI-Det: Exploring Multi-Modal Integration for Visible and Infrared Object Detection

MEDMCN: a Novel Multi-Modal EfficientDet with Multi-Scale CapsNet for Object Detection

Multi-Feature Information Complementary Detector: A High-Precision Object Detection Model for Remote Sensing Images

OmDet: Large‐scale vision‐language multi‐dataset pre‐training with multimodal detection network

Low-Rank Multimodal Remote Sensing Object Detection With Frequency Filtering Experts

MHLDet: A Multi-Scale and High-Precision Lightweight Object Detector Based on Large Receptive Field and Attention Mechanism for Remote Sensing Images

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection