Abstract:Multispectral images (e.g. visible and infrared) may be particularly useful when detecting objects with the same model in different environments (e.g. day/night outdoor scenes). To effectively use the different spectra, the main technical problem resides in the information fusion process. In this paper, we propose a new halfway feature fusion method for neural networks that leverages the complementary/consistency balance existing in multispectral features by adding to the network architecture, a particular module that cyclically fuses and refines each spectral feature. We evaluate the effectiveness of our fusion method on two challenging multispectral datasets for object detection. Our results show that implementing our Cyclic Fuse-and-Refine module in any network improves the performance on both datasets compared to other state-of-the-art multispectral object detection methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively fuse the information of multispectral images (such as visible - light and infrared images) when using the same model to detect objects in different environments (for example, outdoor scenes during the day and at night). Specifically, the paper focuses on how to design a new feature - fusion method in a neural network to balance the complementarity and consistency of multispectral features, thereby improving the performance of object detection. ### Main Problems 1. **Complementarity and Inconsistency of Multispectral Images**: - Visible - light images usually provide color and texture details, while infrared images are sensitive to the temperature of objects, which is especially useful at night. - Since different - spectral images provide different views of the same scene, the extracted features may be inconsistent, making the fusion process difficult and error - prone. 2. **Limitations of Existing Methods**: - Early - fusion methods may not be able to fully utilize the complementary information of different spectra. - Late - fusion methods may not be able to effectively handle the inconsistency between different spectra. ### Solution The paper proposes a new feature - fusion method - the **Cyclic Fuse - and - Reﬁne (CFR) module**, which gradually improves the balance between the consistency and complementarity of features by cyclically fusing and refining each spectral feature multiple times in the network. ### Specific Methods 1. **Feature Fusion and Refinement**: - In each cycle \(i\), for the fused feature \(f_i\), the visible - light feature \(v_i\) and the infrared feature \(t_i\), multispectral feature fusion can be represented as: \[ f_i^f = F(\sigma(f_{i - 1}^t, f_{i - 1}^v)) \] where \(\sigma\) is the feature concatenation operation, and \(F\) is a \(3\times3\) convolutional layer followed by a batch - normalization operation. - The fused feature is assigned as a residual to the spectral features for refinement: \[ f_i^t = H(f_{i - 1}^t + f_i^f), \quad f_i^v = H(f_{i - 1}^v + f_i^f) \] where \(H\) is an activation function (such as ReLU). 2. **Semantic Supervision**: - In order to prevent the vanishing - gradient problem and better guide the multispectral feature fusion, an auxiliary semantic - segmentation task is introduced to provide separate supervision information for each refined spectral feature. - Predict two pedestrian segmentation masks through a \(1\times1\) convolutional layer, one for the visible - light channel and the other for the infrared channel. 3. **Final Fusion**: - Since the optimal number of cycles is unknown and may vary from image pair to image pair, all refined spectral features are aggregated to generate the final fused feature for the object - detection part of the network. - The aggregation method is a simple element - wise averaging function: \[ \frac{1}{2I}\left(\sum_{i = 1}^I f_i^t+\sum_{i = 1}^I f_i^v\right) \] ### Experimental Results The paper evaluates the effectiveness of the CFR module on two challenging multispectral datasets: - **KAIST Multispectral Pedestrian Detection Dataset**: The experimental results show that the model using the CFR module significantly outperforms other state - of - the - art multispectral object - detection methods in detection accuracy. - **FLIR ADAS Dataset**: On this dataset, the CFR module also achieves an important mAP gain. ### Conclusion By introducing the Cyclic Fuse - and - Reﬁne module, the paper successfully improves the balance between the consistency and complementarity of multispectral features, thereby significantly enhancing the performance of object detection on multiple datasets.

Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks

Enhanced Spectral-Spatial Fusion Network for Multispectral Object Detection in Ground-Aerial Images

Multi-spectral Image Fusion for Moving Object Detection

ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection

Background-Aware Cross-Attention Multiscale Fusion for Multispectral Object Detection

ℱ3-Net: Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images

Cross-Modality Fusion Transformer for Multispectral Object Detection

Multispectral Deep Neural Network Fusion Method for Low-Light Object Detection

Modality-inter Fusion and Enhancement Network for Dual-Spectral Object Detection

Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

Multispectral Object Detection Based on Multilevel Feature Fusion and Dual Feature Modulation

Adaptive Multilevel Fusion Refinement Network for Object Detection in Remote Sensing Images

SAFuseNet: Integration of Fusion and Detection for Infrared and Visible Images

Multi-model imaging detection using a learning feature fusion module

Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Multi-Modal Object Detection Method Based on Dual-Branch Asymmetric Attention Backbone and Feature Fusion Pyramid Network

Exploring Multi-scale Deep Feature Fusion for Object Detection.

Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection

Multi-Modality Image Fusion and Object Detection Based on Semantic Information

Exploiting fusion architectures for multispectral pedestrian detection and segmentation