HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation

Chengjie Jiang,Xiaowen Liu,Bowen Zheng,Lu Bai,Jing Li
2024-07-14
Abstract:Infrared and visible image fusion has been developed from vision perception oriented fusion methods to strategies which both consider the vision perception and high-level vision task. However, the existing task-driven methods fail to address the domain gap between semantic and geometric representation. To overcome these issues, we propose a high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation, terms as HSFusion. Specifically, to minimize the gap between semantic and geometric representation, we design two separate domain transformation branches by CycleGAN framework, and each includes two processes: the forward segmentation process and the reverse reconstruction process. CycleGAN is capable of learning domain transformation patterns, and the reconstruction process of CycleGAN is conducted under the constraint of these patterns. Thus, our method can significantly facilitate the integration of semantic and geometric information and further reduces the domain gap. In fusion stage, we integrate the infrared and visible features that extracted from the reconstruction process of two seperate CycleGANs to obtain the fused result. These features, containing varying proportions of semantic and geometric information, can significantly enhance the high level vision tasks. Additionally, we generate masks based on segmentation results to guide the fusion task. These masks can provide semantic priors, and we design adaptive weights for two distinct areas in the masks to facilitate image fusion. Finally, we conducted comparative experiments between our method and eleven other state-of-the-art methods, demonstrating that our approach surpasses others in both visual appeal and semantic segmentation task.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper "HSFusion: High-Level Vision Task-Driven Infrared and Visible Image Fusion Network Based on Semantic and Geometric Domain Transformation" aims to address the following issues: 1. **Domain Gap Between Semantic and Geometric Representations**: - Existing task-driven methods fail to effectively address the domain gap between semantic and geometric representations when fusing infrared and visible images. This leads to poor performance of the fusion results in high-level vision tasks (e.g., semantic segmentation). 2. **Joint Optimization of Fusion and High-Level Vision Tasks**: - Traditional image fusion methods usually focus only on visual perception, neglecting the needs of high-level vision tasks. To improve the effectiveness of fusion results in high-level vision tasks, a method that can simultaneously optimize fusion and high-level vision tasks is needed. 3. **Complementarity of Different Modal Information**: - Visible light sensors can clearly capture the texture details of objects but are easily affected by extreme conditions (e.g., darkness, strong light, or rain and fog). Infrared sensors capture information through thermal radiation, excelling at capturing object contours and being robust to environmental changes, but lack detailed texture information. Therefore, a method is needed to integrate information from these two modalities to meet the needs of visual perception and high-level vision tasks. ### Solution To address the above issues, the authors propose a high-level vision task-driven infrared and visible image fusion network based on semantic and geometric domain transformation (HSFusion). Specifically: 1. **Dual Independent Pre-trained Feature Extractors**: - Two separate CycleGAN frameworks are used as feature extractors to process infrared and visible images, respectively. Each CycleGAN framework includes a forward segmentation process and a backward reconstruction process to learn stable domain transformation patterns. 2. **Adaptive Feature Fusion Network**: - During the fusion stage, masks are generated based on segmentation results, and an adaptive weighting strategy is designed to focus more on infrared features in thermal source areas and visible features in non-thermal source areas during the fusion process. 3. **Semantic Segmentation-Guided Fusion**: - The masks generated from semantic segmentation results guide the fusion process, enhancing the complementary semantic priors of different source images, thereby improving the performance of fusion and high-level vision tasks. ### Main Contributions 1. **Comprehensive Extraction of Semantic and Geometric Information**: - By using two independent pre-trained feature extractors, the semantic and geometric information of infrared and visible images is fully extracted, not only improving visual perception but also enhancing the semantic representation of the fusion results. 2. **Minimizing the Domain Gap Between Semantic and Geometric Information**: - The CycleGAN structure is used to learn the latent transformation patterns of different domains, integrating semantic and geometric information under these constraints. 3. **Guiding the Fusion Process with Semantic Masks**: - The generated semantic masks enhance the complementary semantic priors of different source images, further improving the performance of fusion and high-level vision tasks. 4. **Experimental Validation**: - Experimental results show that HSFusion achieves state-of-the-art performance in both visual perception and high-level semantic segmentation tasks. Through these methods, HSFusion effectively addresses the issues present in existing methods when fusing infrared and visible images, enhancing the application value of fusion results in high-level vision tasks.