TeRF: Text-driven and Region-aware Flexible Visible and Infrared Image Fusion

Hebaixu Wang,Hao Zhang,Xunpeng Yi,Xinyu Xiang,Leyuan Fang,Jiayi Ma
DOI: https://doi.org/10.1145/3664647.3680971
2024-01-01
Abstract:The fusion of visible and infrared images aims to produce high-quality fusion images with rich textures and salient target information. Existing methods lack interactivity and flexibility in the execution of fusion. It is unfeasible to express the requirements to modify the fusion effect, and the different regions in the source images are treated equally across the identical fusion model, which causes fusion homogenization and low distinction. Besides, their pre-defined fusion strategies invariably lead to monotonous effects, which are insufficiently comprehensive. They fail to adequately consider data credibility, scene illumination, and noise degradation inherent in the source information. To address these issues, we propose the Te xt-driven and Region-aware Flexible visible and infrared image fusion, termed as TeRF. On the one hand, we propose a flexible image fusion framework with multiple large language and vision models, which facilitates the visual-text interaction. On the other hand, we aggregate comprehensive fine-tuning paradigms for the different fusion requirements to build a unified fine-tuning pipeline. It allows the linguistic selection of the regions and effects, yielding visually appealing fusion outcomes. Extensive experiments demonstrate the competitiveness of our method both qualitatively and quantitatively compared to existing state-of-the-art methods. Our code is publicly available at https://github.com/Baixuzx7/TeRF.
What problem does this paper attempt to address?