Image harmonization with Simple Hybrid CNN-Transformer Network

Guanlin Li,Bin Zhao,Xuelong Li
DOI: https://doi.org/10.1016/j.neunet.2024.106673
2024-08-30
Abstract:Image harmonization seeks to transfer the illumination distribution of the background to that of the foreground within a composite image. Existing methods lack the ability of establishing global-local pixel illumination dependencies between foreground and background of composite images, which is indispensable for sharp and color-consistent harmonized image generation. To overcome this challenge, we design a novel Simple Hybrid CNN-Transformer Network (SHT-Net), which is formulated into an efficient symmetrical hierarchical architecture. It is composed of two newly designed light-weight Transformer blocks. Firstly, the scale-aware gated block is designed to capture multi-scale features through different heads and expand the receptive fields, which facilitates to generate images with fine-grained details. Secondly, we introduce a simple parallel attention block, which integrates the window-based self-attention and gated channel attention in parallel, resulting in simultaneously global-local pixel illumination relationship modeling capability. Besides, we propose an efficient simple feed forward network to filter out less informative features and allow the features to contribute to generating photo-realistic harmonized results passing through. Extensive experiments on image harmonization benchmarks indicate that our method achieve promising quantitative and qualitative results. The code and pre-trained models are available at https://github.com/guanguanboy/SHT-Net.
What problem does this paper attempt to address?