InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning

Tiancheng Li,Jinxiu Liu,Huajun Chen,Qi Liu
2024-06-14
Abstract:Instruction-based image editing has made a great process in using natural human language to manipulate the visual content of images. However, existing models are limited by the quality of the dataset and cannot accurately localize editing regions in images with complex object relationships. In this paper, we propose Reinforcement Learning Guided Image Editing Method(InstructRL4Pix) to train a diffusion model to generate images that are guided by the attention maps of the target object. Our method maximizes the output of the reward model by calculating the distance between attention maps as a reward function and fine-tuning the diffusion model using proximal policy optimization (PPO). We evaluate our model in object insertion, removal, replacement, and transformation. Experimental results show that InstructRL4Pix breaks through the limitations of traditional datasets and uses unsupervised learning to optimize editing goals and achieve accurate image editing based on natural human commands.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of achieving precise instruction-based image editing in complex image editing tasks. Existing models are limited by the quality of the dataset and cannot accurately locate the editing regions in images with complex object relationships. To solve this problem, the paper proposes a reinforcement learning-guided image editing method (InstructRL4Pix), which trains a diffusion model to generate images guided by target object attention maps. Specifically, this method calculates the distance between attention maps as a reward function and uses Proximal Policy Optimization (PPO) to fine-tune the diffusion model, thereby overcoming the limitations of traditional datasets and achieving accurate image editing based on natural language instructions. Experimental results show that InstructRL4Pix can achieve higher accuracy and better visual effects in complex image editing tasks.