SAMControl: Controlling Pose and Object for Image Editing with Soft Attention Mask

Yue Zhang,Chao Wang,Feifei Fang,Yunzhi Zhuge,Hehe Fan,Xiaojun Chang,Cheng Deng,Yi Yang
DOI: https://doi.org/10.1145/3702999
2024-01-01
Abstract:To achieve content-consistent results in text-conditioned image editing, existing methods typically employ a reconstruction branch to capture the source image details via diffusion inversion and a generation branch to synthesize the target image based on the given textual prompt and the masked source image details. However, accurately segmenting source details is challenging with the current fixed-threshold mask strategy. Additionally, the inadequacies in the inversion process can lead to insufficient retention of source details. In this paper, we propose a method called SAMControl ( S oft A ttention M ask) to adaptively control the pose and object details for image editing. SAMControl dynamically learns flexible attention masks for different images at various diffusion steps. Furthermore, in the reconstruction branch, we utilize a direct inversion technique to ensure the fidelity of source details within SAM. Extensive qualitative and quantitative results demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?