Where You Edit is What You Get: Text-guided Image Editing with Region-Based Attention.

Changming Xiao,Qi Yang,Xiaoqiang Xu,Jianwei Zhang,Feng Zhou,Changshui Zhang
DOI: https://doi.org/10.1016/j.patcog.2023.109458
IF: 8
2023-01-01
Pattern Recognition
Abstract:Leveraging the abundant knowledge learned from pre-trained multi-modal models like CLIP has recently proved to be effective for text-guided image editing. Though convincing results have been made when combining the image generator StyleGAN with CLIP, most methods need to train separate models for different prompts, and irrelevant regions are often changed after editing due to the lack of spatial disen-tanglement. We propose a novel framework that can edit different images according to different prompts in one model. Besides, an innovative region-based spatial attention mechanism is adopted to explicitly guarantee the locality of editing. Experiments mainly in the face domain verify the feasibility of our framework and show that when multi-text editing and local editing are accomplishable, our method can complete practical applications like sequential editing and regional style transfer.(c) 2023 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?