AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

Chunyi Li,Tengchuan Kou,Yixuan Gao,Yuqin Cao,Wei Sun,Zicheng Zhang,Yingjie Zhou,Zhichao Zhang,Weixia Zhang,Haoning Wu,Xiaohong Liu,Xiongkuo Min,Guangtao Zhai
2024-04-04
Abstract:With the rapid advancements in AI-Generated Content (AIGC), AI-Generated Images (AIGIs) have been widely applied in entertainment, education, and social media. However, due to the significant variance in quality among different AIGIs, there is an urgent need for models that consistently match human subjective ratings. To address this issue, we organized a challenge towards AIGC quality assessment on NTIRE 2024 that extensively considers 15 popular generative models, utilizing dynamic hyper-parameters (including classifier-free guidance, iteration epochs, and output image resolution), and gather subjective scores that consider perceptual quality and text-to-image alignment altogether comprehensively involving 21 subjects. This approach culminates in the creation of the largest fine-grained AIGI subjective quality database to date with 20,000 AIGIs and 420,000 subjective ratings, known as AIGIQA-20K. Furthermore, we conduct benchmark experiments on this database to assess the correspondence between 16 mainstream AIGI quality models and human perception. We anticipate that this large-scale quality database will inspire robust quality indicators for AIGIs and propel the evolution of AIGC for vision. The database is released on
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper focuses on the problem of quality assessment for Artificially Intelligent Generated Images (AIGIs). With the rapid development of AI-generated content, particularly the widespread application of Text-to-Image (T2I) models in entertainment, education, and social media, there is an urgent need to establish a model that is equivalent to human subjective evaluation. Existing Image Quality Assessment (IQA) metrics are not suitable for AIGIs, as their quality is influenced not only by image processing but also by hardware limitations and technical proficiency. To address this issue, the paper creates a large-scale database called AIGIQA-20K, which consists of 20,000 images from 15 popular T2I models, with 420,000 subjective ratings for each image. This database considers dynamically adjustable hyperparameters, such as Classifier Free Guidance (CFG), iteration count, and resolution, to comprehensively reflect the visual distortions of AIGIs. Additionally, it incorporates real user inputs from the AIGC community as prompts to obtain more reasonable quality scores. Through analyzing this database, the paper discovers significant variations in image quality among different models, indicating the influence of the models themselves, prompt texts, and hyperparameters on AIGI quality. Experimental results show that non-default hyperparameter configurations decrease image quality, and even the latest models perform better than the upper limit of old models, even at their worst. Prompt text length also affects quality, with some models excelling at handling short prompts while others perform better with longer prompts. The paper also conducts subjective quality assessment experiments, inviting 21 participants to score the images. Spearman's rank correlation coefficient was used to remove outliers, and the original scores were transformed into logarithmic standardized Mean Opinion Scores (MOS) to achieve a more uniform data distribution suitable for IQA tasks. By conducting an in-depth analysis of the AIGIQA-20K database, the paper summarizes the main factors that affect AIGI subjective quality: T2I models, prompt texts, and hyperparameters. These findings contribute to the development of more accurate AIGI quality metrics and the advancement of the AIGC field.