Abstract:With the rapid advancements in AI-Generated Content (AIGC), AI-Generated Images (AIGIs) have been widely applied in entertainment, education, and social media. However, due to the significant variance in quality among different AIGIs, there is an urgent need for models that consistently match human subjective ratings. To address this issue, we organized a challenge towards AIGC quality assessment on NTIRE 2024 that extensively considers 15 popular generative models, utilizing dynamic hyper-parameters (including classifier-free guidance, iteration epochs, and output image resolution), and gather subjective scores that consider perceptual quality and text-to-image alignment altogether comprehensively involving 21 subjects. This approach culminates in the creation of the largest fine-grained AIGI subjective quality database to date with 20,000 AIGIs and 420,000 subjective ratings, known as AIGIQA-20K. Furthermore, we conduct benchmark experiments on this database to assess the correspondence between 16 mainstream AIGI quality models and human perception. We anticipate that this large-scale quality database will inspire robust quality indicators for AIGIs and propel the evolution of AIGC for vision. The database is released on

What problem does this paper attempt to address?

This paper focuses on the problem of quality assessment for Artificially Intelligent Generated Images (AIGIs). With the rapid development of AI-generated content, particularly the widespread application of Text-to-Image (T2I) models in entertainment, education, and social media, there is an urgent need to establish a model that is equivalent to human subjective evaluation. Existing Image Quality Assessment (IQA) metrics are not suitable for AIGIs, as their quality is influenced not only by image processing but also by hardware limitations and technical proficiency. To address this issue, the paper creates a large-scale database called AIGIQA-20K, which consists of 20,000 images from 15 popular T2I models, with 420,000 subjective ratings for each image. This database considers dynamically adjustable hyperparameters, such as Classifier Free Guidance (CFG), iteration count, and resolution, to comprehensively reflect the visual distortions of AIGIs. Additionally, it incorporates real user inputs from the AIGC community as prompts to obtain more reasonable quality scores. Through analyzing this database, the paper discovers significant variations in image quality among different models, indicating the influence of the models themselves, prompt texts, and hyperparameters on AIGI quality. Experimental results show that non-default hyperparameter configurations decrease image quality, and even the latest models perform better than the upper limit of old models, even at their worst. Prompt text length also affects quality, with some models excelling at handling short prompts while others perform better with longer prompts. The paper also conducts subjective quality assessment experiments, inviting 21 participants to score the images. Spearman's rank correlation coefficient was used to remove outliers, and the original scores were transformed into logarithmic standardized Mean Opinion Scores (MOS) to achieve a more uniform data distribution suitable for IQA tasks. By conducting an in-depth analysis of the AIGIQA-20K database, the paper summarizes the main factors that affect AIGI subjective quality: T2I models, prompt texts, and hyperparameters. These findings contribute to the development of more accurate AIGI quality metrics and the advancement of the AIGC field.

AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence

AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment

A Perceptual Quality Assessment Exploration for AIGC Images

PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images

AI-generated Image Quality Assessment in Visual Communication

AIGC Image Quality Assessment Via Image-Prompt Correspondence

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Quality Assessment of AI-Generated Image Based on Cross-modal Correlation

Subjective and Objective Quality Assessment for in-the-Wild Computer Graphics Images

Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model

Generalized Visual Quality Assessment of GAN-Generated Face Images

Subjective Quality Assessment for Images Generated by Computer Graphics

AI-Generated Image Quality Assessment Based on Task-Specific Prompt and Multi-Granularity Similarity

AIGC-VQA: A Holistic Perception Metric for AIGC Video Quality Assessment