Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models

Nila Masrourisaadat,Nazanin Sedaghatkish,Fatemeh Sarshartehrani,Edward A. Fox
2024-06-28
Abstract:Advances in generative models have led to significant interest in image synthesis, demonstrating the ability to generate high-quality images for a diverse range of text prompts. Despite this progress, most studies ignore the presence of bias. In this paper, we examine several text-to-image models not only by qualitatively assessing their performance in generating accurate images of human faces, groups, and specified numbers of objects but also by presenting a social bias analysis. As expected, models with larger capacity generate higher-quality images. However, we also document the inherent gender or social biases these models possess, offering a more complete understanding of their impact and limitations.
Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Quality and performance evaluation of image generation**: The paper aims to evaluate the performance of text - to - image generation models in generating high - quality images, especially the performance of these models when generating images with complex facial features and motion attributes. Researchers not only focus on the visual quality of the images generated by the models, but also examine the models' ability to respond to specific text prompts, such as generating a specified number of objects, human faces or group images. 2. **Analysis of social biases**: In addition to the evaluation of technical performance, the paper also delves into the gender and social bias problems existing in these text - to - image generation models. Specifically, through a series of designed tests (such as gender bias and racial bias tests), researchers have revealed the inherent biases shown by these models when processing neutral text prompts. This includes, but is not limited to, the situation where the models tend to generate white male images as certain professional roles (such as CEOs, managers, etc.), and the phenomenon that the models tend to generate male images when using neutral words (such as "person", "human"). Through the comprehensive evaluation of the above two aspects, the paper aims to provide a comprehensive understanding of the current capabilities of text - to - image generation models and their potential social impacts and limitations. This research is of great significance for promoting the healthy development of this field, especially in ensuring the fairness and ethics of technology applications.