Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks

Heng Zhang,Elisa Fromont,Sébastien Lefevre,Bruno Avignon
DOI: https://doi.org/10.48550/arXiv.2009.12664
2020-09-27
Abstract:Multispectral images (e.g. visible and infrared) may be particularly useful when detecting objects with the same model in different environments (e.g. day/night outdoor scenes). To effectively use the different spectra, the main technical problem resides in the information fusion process. In this paper, we propose a new halfway feature fusion method for neural networks that leverages the complementary/consistency balance existing in multispectral features by adding to the network architecture, a particular module that cyclically fuses and refines each spectral feature. We evaluate the effectiveness of our fusion method on two challenging multispectral datasets for object detection. Our results show that implementing our Cyclic Fuse-and-Refine module in any network improves the performance on both datasets compared to other state-of-the-art multispectral object detection methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively fuse the information of multispectral images (such as visible - light and infrared images) when using the same model to detect objects in different environments (for example, outdoor scenes during the day and at night). Specifically, the paper focuses on how to design a new feature - fusion method in a neural network to balance the complementarity and consistency of multispectral features, thereby improving the performance of object detection. ### Main Problems 1. **Complementarity and Inconsistency of Multispectral Images**: - Visible - light images usually provide color and texture details, while infrared images are sensitive to the temperature of objects, which is especially useful at night. - Since different - spectral images provide different views of the same scene, the extracted features may be inconsistent, making the fusion process difficult and error - prone. 2. **Limitations of Existing Methods**: - Early - fusion methods may not be able to fully utilize the complementary information of different spectra. - Late - fusion methods may not be able to effectively handle the inconsistency between different spectra. ### Solution The paper proposes a new feature - fusion method - the **Cyclic Fuse - and - Refine (CFR) module**, which gradually improves the balance between the consistency and complementarity of features by cyclically fusing and refining each spectral feature multiple times in the network. ### Specific Methods 1. **Feature Fusion and Refinement**: - In each cycle \(i\), for the fused feature \(f_i\), the visible - light feature \(v_i\) and the infrared feature \(t_i\), multispectral feature fusion can be represented as: \[ f_i^f = F(\sigma(f_{i - 1}^t, f_{i - 1}^v)) \] where \(\sigma\) is the feature concatenation operation, and \(F\) is a \(3\times3\) convolutional layer followed by a batch - normalization operation. - The fused feature is assigned as a residual to the spectral features for refinement: \[ f_i^t = H(f_{i - 1}^t + f_i^f), \quad f_i^v = H(f_{i - 1}^v + f_i^f) \] where \(H\) is an activation function (such as ReLU). 2. **Semantic Supervision**: - In order to prevent the vanishing - gradient problem and better guide the multispectral feature fusion, an auxiliary semantic - segmentation task is introduced to provide separate supervision information for each refined spectral feature. - Predict two pedestrian segmentation masks through a \(1\times1\) convolutional layer, one for the visible - light channel and the other for the infrared channel. 3. **Final Fusion**: - Since the optimal number of cycles is unknown and may vary from image pair to image pair, all refined spectral features are aggregated to generate the final fused feature for the object - detection part of the network. - The aggregation method is a simple element - wise averaging function: \[ \frac{1}{2I}\left(\sum_{i = 1}^I f_i^t+\sum_{i = 1}^I f_i^v\right) \] ### Experimental Results The paper evaluates the effectiveness of the CFR module on two challenging multispectral datasets: - **KAIST Multispectral Pedestrian Detection Dataset**: The experimental results show that the model using the CFR module significantly outperforms other state - of - the - art multispectral object - detection methods in detection accuracy. - **FLIR ADAS Dataset**: On this dataset, the CFR module also achieves an important mAP gain. ### Conclusion By introducing the Cyclic Fuse - and - Refine module, the paper successfully improves the balance between the consistency and complementarity of multispectral features, thereby significantly enhancing the performance of object detection on multiple datasets.