DiffHarmony++: Enhancing Image Harmonization with Harmony-VAE and Inverse Harmonization Model

Pengfei Zhou,Fangxiang Feng,Guang Liu,Ruifan Li,Xiaojie Wang
DOI: https://doi.org/10.1145/3664647.3681466
2024-01-01
Abstract:Latent diffusion model has demonstrated impressive efficacy in image generation and editing tasks. Recently, it has also promoted the advancement of image harmonization. However, methods involving latent diffusion model all face a common challenge: the severe image distortion introduced by the VAE component, while image harmonization is a low-level image processing task that relies on pixel-level evaluation metrics. In this paper, we propose Harmony-VAE, leveraging the input of the harmonization task itself to enhance the quality of decoded images. The input involving composite image contains the precise pixel level information, which can complement the correct foreground appearance and color information contained in denoised latents. Meanwhile, the inherent generative nature of diffusion models makes it naturally adapt to inverse image harmonization, i.e. generating synthetic composite images based on real images and foreground masks. We train an inverse harmonization diffusion model to perform data augmentation on two subsets of iHarmony4 and construct a new human harmonization dataset with prominent foreground objects. Extensive experiments demonstrate the effectiveness of our proposed Harmony-VAE and inverse harmonization model. Code and pretrained models are available at https://github.com/nicecv/DiffHarmony.
What problem does this paper attempt to address?