Abstract:Fusion of images acquired using different sensors generates a single output with enhanced information for high-level visual perception applications. The transformer architecture has demonstrated its powerful ability to obtain important global contextual dependencies for multi-modal image fusion tasks. However, transformer-based image fusion methods face many critical issues, such as incurring huge computational burdens, limited ability to learn local features, and the difficulty of handling images of arbitrary sizes. To address the above limits, we proposed a novel Laplacian Pyramid Hybrid (LapH) network to combine the advantages of CNN and transformer architectures for multi-modal image fusion tasks. With the divide-and-conquer philosophy, we first build a light-weight CNN-based branch, performing effective extraction and fusion of texture/edge features via central difference convolutions, to process the high-resolution components with abundant details encoded in the lower pyramid levels of the Laplacian pyramid. Then, we design a transformer-based branch to process the low-resolution base components, learning long-range dependencies of global-contextual features without incurring extensive computational loads. Here, we design a multi-scale recurrent modulation mechanism to integrate the edge/texture features from the CNN branch as guidance to progressively refine the feature extraction and fusion on low-frequency components. Finally, we propose a new multi-scale spatial consistency loss term based on the neighbor contrast in source images, generating fused images with more natural and realistic appearances. Extensive experiments on two different multi-modal image fusion tasks verify the superiority of our method. The source codes are made publicly available at https://github.com/rgttadv/LapH .

Multi-focus image fusion based on transformer and depth information learning

StackMFF: End-to-end Multi-Focus Image Stack Fusion Network

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Multi-focus Image Fusion Using Fully Convolutional Two-stream Network for Visual Sensors.

CCSR-Net: Unfolding Coupled Convolutional Sparse Representation for Multi-focus Image Fusion.

Multi-Focus Image Fusion Using U-Shaped Networks with a Hybrid Objective

Multi-focus image fusion with deep residual learning and focus property detection

Multi-focus image fusion with a deep convolutional neural network

Multi-Modal Image Fusion Via Deep Laplacian Pyramid Hybrid Network

Mutli-focus image fusion based on guided filter and image matting network

UFA-FUSE: A novel deep supervised and hybrid model for multi-focus image fusion

Multi-focused image fusion algorithm based on multi-scale hybrid attention residual network

Combining transformers with CNN for multi-focus image fusion

Multi-Focus Image Fusion Based on Multi-Scale Gradients and Image Matting

Multi-Focus Image Fusion Algorithm in Sensor Networks

A histological and immunohistochemical study of the humoral immune system of the lungs in young Thoroughbred horses.

Multi-focus image fusion based on depth estimation in HSV space

A multi‐focus image fusion network deployed in smart city target detection

Focus Affinity Perception and Super-Resolution Embedding for Multifocus Image Fusion

Learning to Fuse Multi-Focus Image via Convolutional Network Modeling

MFST: Multi-Modal Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion