Abstract:Fusion of images acquired using different sensors generates a single output with enhanced information for high-level visual perception applications. The transformer architecture has demonstrated its powerful ability to obtain important global contextual dependencies for multi-modal image fusion tasks. However, transformer-based image fusion methods face many critical issues, such as incurring huge computational burdens, limited ability to learn local features, and the difficulty of handling images of arbitrary sizes. To address the above limits, we proposed a novel Laplacian Pyramid Hybrid (LapH) network to combine the advantages of CNN and transformer architectures for multi-modal image fusion tasks. With the divide-and-conquer philosophy, we first build a light-weight CNN-based branch, performing effective extraction and fusion of texture/edge features via central difference convolutions, to process the high-resolution components with abundant details encoded in the lower pyramid levels of the Laplacian pyramid. Then, we design a transformer-based branch to process the low-resolution base components, learning long-range dependencies of global-contextual features without incurring extensive computational loads. Here, we design a multi-scale recurrent modulation mechanism to integrate the edge/texture features from the CNN branch as guidance to progressively refine the feature extraction and fusion on low-frequency components. Finally, we propose a new multi-scale spatial consistency loss term based on the neighbor contrast in source images, generating fused images with more natural and realistic appearances. Extensive experiments on two different multi-modal image fusion tasks verify the superiority of our method. The source codes are made publicly available at https://github.com/rgttadv/LapH .

Multi-Dimensional Image Recovery Via Fully-Connected Tensor Network Decomposition under the Learnable Transforms

Multi-Dimensional Data Recovery Via Feature-Based Fully-Connected Tensor Network Decomposition

Nested Fully-Connected Tensor Network Decomposition for Multi-Dimensional Visual Data Recovery

Learnable Spatial-Spectral Transform-Based Tensor Nuclear Norm for Multi-Dimensional Visual Data Recovery

Multiplex Transformed Tensor Decomposition for Multidimensional Image Recovery.

Multiscale Feature Tensor Train Rank Minimization for Multidimensional Image Recovery

Self-Supervised Nonlinear Transform-Based Tensor Nuclear Norm for Multi-Dimensional Image Recovery

Fully-connected Tensor Network Decomposition for Robust Tensor Completion Problem

A Biased Deep Tensor Factorization Network For Tensor Completion

Tensor Completion Via Fully-Connected Tensor Network Decomposition with Regularized Factors

A DCT-based Tensor Completion Approach for Recovering Color Images and Videos from Highly Undersampled Data.

Nonlocal Patch-Based Fully Connected Tensor Network Decomposition for Multispectral Image Inpainting

Multi-Modal Image Fusion Via Deep Laplacian Pyramid Hybrid Network

CTFCD: Channel Transformer Based on Full Convolutional Decoder for Single Image Deraining

Functional Transform-Based Low-Rank Tensor Factorization for Multi-Dimensional Data Recovery

Adaptive Tensor Networks Decomposition for High-Order Tensor Recovery and Compression

A Learnable Group-Tube Transform Induced Tensor Nuclear Norm and Its Application for Tensor Completion.

Multi-Tensor Network Representation for High-Order Tensor Completion

Multi-Dimensional Visual Data Completion via Low-Rank Tensor Representation Under Coupled Transform

"Sparse + Low-Rank” Tensor Completion Approach for Recovering Images and Videos

Low-rank Tensor Completion Via Combined Tucker and Tensor Train for Color Image Recovery