Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback

Xin Chen,Virgile Foussereau
2024-06-28
Abstract:This study addresses gender bias in image generation models using Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) with a novel Denoising Diffusion Policy Optimization (DDPO) pipeline. By employing a pretrained stable diffusion model and a highly accurate gender classification Transformer, the research introduces two reward functions: Rshift for shifting gender imbalances, and Rbalance for achieving and maintaining gender balance. Experiments demonstrate the effectiveness of this approach in mitigating bias without compromising image quality or requiring additional data or prompt modifications. While focusing on gender bias, this work establishes a foundation for addressing various forms of bias in AI systems, emphasizing the need for responsible AI development. Future research directions include extending the methodology to other bias types, enhancing the RLAIF pipeline's robustness, and exploring multi-prompt fine-tuning to further advance fairness and inclusivity in AI.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of stereotype bias in image generation models, particularly gender bias. With the advancement of image generation technology, the quality of synthetic images has reached a level that is difficult to distinguish from real images. However, this also brings ethical challenges, especially as models may amplify social stereotypes such as gender and race. The paper proposes a new method based on Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) to reduce gender bias by fine-tuning pre-trained diffusion models without the need for additional data or hard modifications to the prompts. Specifically, the paper introduces two reward functions: Rshift and Rbalance. Rshift is used to quickly adjust gender imbalance in a few fine-tuning steps, while Rbalance is used to achieve and maintain gender balance in generated images. Experimental results show that this method can effectively reduce gender bias without sacrificing image quality and performs well on multiple occupation-related prompts. Additionally, the paper explores how improving trust region constraints can enhance the stability of the fine-tuning process. Overall, this paper provides a new solution for reducing gender bias in image generation models and emphasizes the importance of responsible AI development.