Mitigating Social Biases in Text-to-Image Diffusion Models Via Linguistic-Aligned Attention Guidance

Yue Jiang,Yueming Lyu,Ziwen He,Bo Peng,Jing Dong
DOI: https://doi.org/10.1145/3664647.3680748
2024-01-01
Abstract:Recent advancements in text-to-image generative models have showcased remarkable capabilities across various tasks. However, these powerful models have revealed the inherent risks of social biases. Such biases can propagate distorted real-world perspectives and spread unforeseen prejudice and discrimination. Current debiasing methods are primarily designed for scenarios with a single individual in the image and exhibit homogenous race or gender when multiple individuals are involved, harming the diversity of social groups within the image. To address this problem, we consider the semantic consistency between text prompts and generated images in text-to-image diffusion models to identify how biases are generated. We propose a novel method to locate where the biases are based on different tokens and then mitigate them for each individual. Specifically, we introduce a Linguistic-aligned Attention Guidance module consisting of Block Voting and Linguistic Alignment, to effectively locate the semantic regions related to biases. Additionally, we employ Fair Inference in these regions to generate fair attributes across arbitrary distributions while preserving the original structural and semantic information. Extensive experiments and analyses demonstrate our method outperforms existing methods for debiasing with multiple individuals across various scenarios.
What problem does this paper attempt to address?