l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion

Gargi Panda,Soumitra Kundu,Saumik Bhattacharya,Aurobinda Routray
2024-11-07
Abstract:Multi-modal image fusion (MMIF) enhances the information content of the fused image by combining the unique as well as common features obtained from different modality sensor images, improving visualization, object detection, and many more tasks. In this work, we introduce an interpretable network for the MMIF task, named FNet, based on an l0-regularized multi-modal convolutional sparse coding (MCSC) model. Specifically, for solving the l0-regularized CSC problem, we develop an algorithm unrolling-based l0-regularized sparse coding (LZSC) block. Given different modality source images, FNet first separates the unique and common features from them using the LZSC block and then these features are combined to generate the final fused image. Additionally, we propose an l0-regularized MCSC model for the inverse fusion process. Based on this model, we introduce an interpretable inverse fusion network named IFNet, which is utilized during FNet's training. Extensive experiments show that FNet achieves high-quality fusion results across five different MMIF tasks. Furthermore, we show that FNet enhances downstream object detection in visible-thermal image pairs. We have also visualized the intermediate results of FNet, which demonstrates the good interpretability of our network.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve some key problems in multi - modal image fusion (MMIF) to improve the quality of fused images and the performance of downstream tasks. Specifically: 1. **Improve the quality of fused images**: By combining the unique and common features in images from different modal sensors, generate fused images with more information, thereby enhancing the effects of tasks such as visualization and object detection. 2. **Introduce ℓ₀ - regularized sparse coding**: In order to address the limitations of existing methods in estimating sparse features, this paper proposes a multi - modal convolutional sparse coding (MCSC) model based on ℓ₀ - regularization. Compared with the traditional ℓ₁ - regularization, ℓ₀ - regularization can estimate sparse features more accurately and avoid over - punishing features with large absolute values. 3. **Develop an interpretable network architecture**: To improve the interpretability of the model, the author designs a network named FNet, which is based on the proposed ℓ₀ - regularized sparse coding model and introduces a new LZSC block to solve the ℓ₀ - regularized CSC problem. In addition, an inverse fusion network IFNet is also proposed to constrain the decomposed source images to be similar to the original source images during the training process, thereby further improving the quality of the fused images. 4. **Solve the unsupervised training problem**: For fused images without ground - truth labels, this paper proposes a two - stage training method. In the first stage, FNet and IFNet are trained simultaneously to ensure that the source images generated by inverse fusion are similar to the original source images; in the second stage, only FNet is trained to optimize the model by maximizing the similarity between the fused images and the source images. Through these improvements, FNet has achieved leading results in multiple multi - modal image fusion tasks and also shown better performance in downstream tasks such as object detection of visible - light - thermal - infrared image pairs. ### Summary of main contributions: 1. **Developed the first learnable LZSC block**: Used to solve the ℓ₀ - regularized CSC problem. 2. **Proposed the MCSC model based on ℓ₀ - regularization**: Used to represent the multi - modal image fusion process. 3. **Designed the inverse fusion network IFNet**: Improved the quality of fused images. 4. **Achieved leading results in five MMIF tasks**: Including visible - infrared (VIS - IR), visible - near - infrared (VIS - NIR), CT - MRI, PET - MRI and SPECT - MRI image fusion. These innovations make FNet not only perform excellently in the quality of fused images but also show strong application potential in downstream tasks.