TGIEN: an Interpretable Image Editing Method for IoT Applications Based on Text Guidance

Yifei Wang,Mengzhu Pan,Qianmu Li
DOI: https://doi.org/10.1109/ithings-greencom-cpscom-smartdata-cybermatics62450.2024.00051
2024-01-01
Abstract:In response to the demands and challenges of image editing in the Internet of Things (IoT) domain, we have proposed an interpretable image editing method guided by text(TGIEN), aiming to address the limitations of existing image editing models based on Generative Adversarial Networks (GANs) for better application in IoT. This method utilizes a Contrastive Language-Image Pre-trained Model (CLIP) to guide the editing of the StyleGAN latent space, enabling manipulation of real images through text prompts. The model incorporates a mapping module conditioned on text embeddings to edit the latent space code of StyleGAN and learns an attention map corresponding to the text across all layers of StyleGAN. These attention maps highlight focus areas during editing, providing an intuitive explanation of the model’s behavior. Additionally, during the image generation process, the attention map functions as a mask to guide local fusion between original and modified features. Furthermore, the model integrates editability losses to ensure that editing operations align closely with the text prompts. Through the aforementioned design, spatial disentanglement is explicitly achieved, enhancing the transparency and interpretability of the model and in turn increasing the capacity of image editing technology in the IoT domain for practical application and system optimization. Both qualitative and quantitative experiments demonstrate the superiority of this method over other state-of-the-art methods in terms of editing quality and decoupling of edits. In the IoT domain, this method holds promise for applications such as smart surveillance systems, smart healthcare and virtual reality, offering more intelligent and efficient image processing and analysis for these scenarios.
What problem does this paper attempt to address?