PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

Jiquan Yuan,Xinyan Cao,Changjin Li,Fanyi Yang,Jinlong Lin,Xixin Cao
2023-11-29
Abstract:As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natural images, and not all generated images meet the requirements of the real world. Therefore, it is of great significance to evaluate AIGIs more comprehensively. Although previous work has established several human perception-based AIGC image quality assessment (AIGCIQA) databases for text-generated images, the AI image generation technology includes scenarios like text-to-image and image-to-image, and assessing only the images generated by text-to-image models is insufficient. To address this issue, we establish a human perception-based image-to-image AIGCIQA database, named PKU-I2IQA. We conduct a well-organized subjective experiment to collect quality labels for AIGIs and then conduct a comprehensive analysis of the PKU-I2IQA database. Furthermore, we have proposed two benchmark models: NR-AIGCIQA based on the no-reference image quality assessment method and FR-AIGCIQA based on the full-reference image quality assessment method. Finally, leveraging this database, we conduct benchmark experiments and compare the performance of the proposed benchmark models. The PKU-I2IQA database and benchmarks will be released to facilitate future research on \url{<a class="link-external link-https" href="https://github.com/jiquan123/I2IQA" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of quality assessment for AI-generated images (AIGI). With the development of AI image generation technology, AI-generated images have been widely used in various fields, but these images may have unique distortion phenomena that do not fully meet real-world requirements. Therefore, a comprehensive evaluation of the quality of AI-generated images has become particularly important. Currently, there are some AI-generated content (AIGC) image quality assessment (AIGCIQA) databases based on human perception, but they mainly focus on text-to-image generation models, neglecting the diversity of image-to-image generation techniques. This has led to a critical gap in current research, namely the lack of databases specifically for image-to-image generation scenarios. To fill this gap, the authors have established the first human perception-based image-to-image AIGCIQA database, named PKU-I2IQA. Additionally, the authors propose two benchmark models: NR-AIGCIQA based on no-reference image quality assessment methods and FR-AIGCIQA based on full-reference image quality assessment methods. Through this database, the authors conducted benchmark experiments and compared the performance of these two benchmark models. ### Main Contributions 1. **Establishment of the first human perception-based image-to-image AIGCIQA database**: PKU-I2IQA. 2. **Proposal of two benchmark models**: NR-AIGCIQA based on no-reference image quality assessment methods and FR-AIGCIQA based on full-reference image quality assessment methods. 3. **Conducting benchmark experiments**: Evaluating and comparing the performance of the proposed benchmark models on the PKU-I2IQA database. ### Method Overview - **Database Construction**: 200 categories were selected from ImageNet, and corresponding high-resolution images were collected as image prompts. Midjourney and Stable Diffusion V1.5 were used to generate images. Each image prompt generated 4 images, resulting in a total of 1600 images. - **Subjective Experiments**: Subjective experiments were organized to collect image quality labels, evaluating from three dimensions: quality, realism, and text-image correspondence. - **Benchmark Models**: Two benchmark models were proposed, based on no-reference and full-reference image quality assessment methods, respectively. Pre-trained backbone networks were used to extract features, and a regression network was used to predict image quality scores. ### Experimental Results - **Performance Comparison**: The performance of the FR-AIGCIQA benchmark model was superior to that of the NR-AIGCIQA benchmark model. - **Best Performance**: Among the backbone networks used, ResNet18 performed best in terms of quality and correspondence, ResNet50 performed best in terms of final score, and InceptionV4 performed best in terms of realism. ### Conclusion Although the proposed benchmark models exhibit certain performance, there is still much room for improvement in designing AIGCIQA models. Future research will focus on how to introduce reference images in text-to-image generation scenarios without image prompts to improve model performance. Additionally, the authors conducted cross-model evaluation experiments, and the results showed that the proposed benchmark models have weak generalization capabilities across different generation models.