Abstract:Visible and Infrared Image Fusion (VIF) has garnered significant interest across a wide range of high-level vision tasks, such as object detection and semantic segmentation. However, the evaluation of VIF methods remains challenging due to the absence of ground truth. This paper proposes a Segmentation-oriented Evaluation Approach (SEA) to assess VIF methods by incorporating the semantic segmentation task and leveraging segmentation labels available in latest VIF datasets. Specifically, SEA utilizes universal segmentation models, capable of handling diverse images and classes, to predict segmentation outputs from fused images and compare these outputs with segmentation labels. Our evaluation of recent VIF methods using SEA reveals that their performance is comparable or even inferior to using visible images only, despite nearly half of the infrared images demonstrating better performance than visible images. Further analysis indicates that the two metrics most correlated to our SEA are the gradient-based fusion metric $Q_{\text{ABF}}$ and the visual information fidelity metric $Q_{\text{VIFF}}$ in conventional VIF evaluation metrics, which can serve as proxies when segmentation labels are unavailable. We hope that our evaluation will guide the development of novel and practical VIF methods. The code has been released in \url{<a class="link-external link-https" href="https://github.com/Yixuan-2002/SEA/" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of evaluation methods in visible - infrared image fusion (VIF). Due to the lack of ground truth, the existing VIF evaluation methods have limitations. The paper proposes a segmentation - oriented evaluation approach (SEA), which evaluates the quality of the fused image by using the semantic segmentation task. Specifically, SEA uses a general - purpose segmentation model to predict the segmentation output from the fused image and compares these outputs with the segmentation labels to evaluate the effect of the fusion method. ### Main Contributions 1. **Proposing a new evaluation method**: SEA solves the problem of lack of ground truth in VIF evaluation by introducing a general - purpose segmentation task and is applicable to multiple classes in different datasets. 2. **Comprehensive comparative study**: 30 of the latest VIF methods were evaluated using SEA and 15 traditional evaluation metrics, covering the latest datasets. 3. **Correlation analysis**: Through statistical correlation measurement, the consistency between SEA and traditional evaluation metrics was evaluated, and it was found that QABF and QVIFF are the two most correlated metrics and can be used as proxy metrics in the absence of segmentation labels. ### Background - **Visible - infrared image fusion**: Visible - light images provide rich color and texture information but are greatly affected by environmental factors; infrared images highlight targets but lack color and texture information. Therefore, fusing these two - modality images can improve the performance of visual tasks. - **Evaluation challenges**: Due to the lack of ground truth, the existing VIF evaluation methods are difficult to accurately evaluate the fusion effect. ### Method - **General - purpose segmentation model**: Three of the latest general - purpose segmentation models, X - Decoder, SEEM, and G - SAM, were selected. These models can handle diverse images and classes. - **Evaluation process**: 1. Generate the fused image using the VIF method. 2. Use the general - purpose segmentation model to predict the segmentation output from the fused image. 3. Compare the predicted segmentation output with the annotated segmentation labels, calculate the mIoU score, and evaluate the fusion quality. ### Experimental Results - **Performance comparison**: The experimental results show that many of the latest VIF methods are in some cases not even as effective as using only visible - light images, although infrared images perform better in some scenarios. - **Correlation analysis**: QABF and QVIFF have the highest correlation with SEA and can be used as evaluation metrics in the absence of segmentation labels. ### Conclusion The SEA method proposed in the paper provides a new and more reliable method for VIF evaluation, which is helpful to guide the future development of VIF methods and improve the quality of the fused image and the performance of downstream visual tasks.

Rethinking the Evaluation of Visible and Infrared Image Fusion

Fusion of infrared and visual images through multiscale hybrid unidirectional total variation

Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network

SIGFusion: Semantic Information-Guided Infrared and Visible Image Fusion

SeGFusion: A semantic saliency guided infrared and visible image fusion method

ASFusion: Adaptive visual enhancement and structural patch decomposition for infrared and visible image fusion

Infrared-Visible Image Fusion Based on Semantic Guidance and Visual Perception

SPFusion: A multi-task semantic perception infrared and visible light fusion method with quality assessment

SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion

MVSFusion: infrared and visible image fusion method for multiple visual scenarios

Infrared and Visible Image Fusion Using Threshold Segmentation and Weight Optimization

SIFusion: Lightweight infrared and visible image fusion based on semantic injection

VIFB: A Visible and Infrared Image Fusion Benchmark

Semantic-Relation Transformer for Visible and Infrared Fused Image Quality Assessment

SCFusion: Infrared and Visible Fusion Based on Salient Compensation

Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity

PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation

An efficient frequency domain fusion network of infrared and visible images

A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

Infrared and visible image fusion based on VPDE model and VGG network

Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System