Abstract:Multi-modal image fusion aggregates information from multiple sensor sources, achieving superior visual quality and perceptual characteristics compared to any single source, often enhancing downstream tasks. However, current fusion methods for downstream tasks still use predefined fusion objectives that potentially mismatch the downstream tasks, limiting adaptive guidance and reducing model flexibility. To address this, we propose Task-driven Image Fusion (TDFusion), a fusion framework incorporating a learnable fusion loss guided by task loss. Specifically, our fusion loss includes learnable parameters modeled by a neural network called the loss generation module. This module is supervised by the loss of downstream tasks in a meta-learning manner. The learning objective is to minimize the task loss of the fused images, once the fusion module has been optimized by the fusion loss. Iterative updates between the fusion module and the loss module ensure that the fusion network evolves toward minimizing task loss, guiding the fusion process toward the task objectives. TDFusion's training relies solely on the loss of downstream tasks, making it adaptable to any specific task. It can be applied to any architecture of fusion and task networks. Experiments demonstrate TDFusion's performance in both fusion and task-related applications, including four public fusion datasets, semantic segmentation, and object detection. The code will be released.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of current multi - modal image fusion methods in downstream tasks. Specifically, existing fusion methods still rely on predefined fusion objectives, which may not match the downstream tasks, thus limiting adaptive guidance and reducing the flexibility of the model. To address these issues, the authors propose a task - driven image fusion framework (TDFusion), which optimizes the fusion results by introducing a learnable fusion loss function to make it more in line with the specific requirements of downstream tasks. ### Specific description of the problem 1. **Limitations of predefined fusion objectives**: - Current fusion methods usually use predefined fusion loss functions, which may not be well - adapted to specific downstream tasks (such as semantic segmentation, object detection, etc.). This results in the model being less flexible and robust in different tasks. 2. **Lack of dynamic adaptability**: - Existing frameworks, although integrating downstream tasks, still rely on fixed fusion loss terms, which cannot dynamically adapt to different task requirements. This static loss function design limits the adaptability of the fusion network to specific image pairs. 3. **Insufficient information retention**: - Traditional fusion methods mainly focus on visual - level information aggregation and ignore capturing crucial semantic information during the feature extraction process, which affects scene understanding and task performance. ### Proposed solution To overcome the above problems, the authors propose the TDFusion framework, whose core features include: - **Task - driven learnable fusion loss**: - TDFusion introduces a learnable fusion loss function, which is generated by a Loss Generation Module and optimized through meta - learning. This loss function can be dynamically adjusted according to the loss of downstream tasks, ensuring that the fusion results better meet the task requirements. - **Alternating update mechanism**: - An alternating update method is adopted between the fusion module and the loss generation module. In each iteration, the fusion module is first optimized through inner update, and then the loss generation module is optimized through outer update. This method ensures that the fusion loss function can continuously guide the fusion process and minimize the loss of downstream tasks. - **Flexibility and adaptability**: - The design of TDFusion enables it to be applied to fusion networks and task networks of any architecture, and it is completely trained depending on the loss of downstream tasks, thus having high flexibility and adaptability. ### Experimental verification The paper verifies the effectiveness of TDFusion through experiments on multiple public datasets (such as MSRS, FMB, M3FD, LLVIP). The experimental results show that TDFusion is superior to existing methods in terms of fusion quality and downstream task performance, especially when dealing with diverse image characteristics and challenging fusion tasks. In summary, this paper aims to solve the limitations of existing fusion methods in downstream tasks by introducing a task - driven learnable fusion loss function, thereby improving the quality of fusion results and task performance.

Task-driven Image Fusion with Learnable Fusion Loss

TDDFusion: A Target-Driven Dual Branch Network for Infrared and Visible Image Fusion

ReFusion: Learning Image Fusion from Reconstruction with Learnable Loss via Meta-Learning

A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion

A Task-guided, Implicitly-searched and Metainitialized Deep Model for Image Fusion

A General Image Fusion Framework Using Multi-Task Semi-Supervised Learning

Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and Beyond

Multi-Modal Image Fusion Via Deep Laplacian Pyramid Hybrid Network

Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond

Image Fusion Based on Feature Decoupling and Proportion Preserving.

TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning

DCFusion: Difference correlation-driven fusion mechanism of infrared and visible images

Trans2Fuse: Empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks

FusionBooster: A Unified Image Fusion Boosting Paradigm

Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network

U2Fusion: A Unified Unsupervised Image Fusion Network

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

Task-Driven Dynamic Fusion: Reducing Ambiguity in Video Description

Different Input Resolutions and Arbitrary Output Resolution: A Meta Learning-Based Deep Framework for Infrared and Visible Image Fusion

FuseFormer: A Transformer for Visual and Thermal Image Fusion

UNIFusion: A Lightweight Unified Image Fusion Network