Visual Verity in AI-Generated Imagery: Computational Metrics and Human-Centric Analysis

Memoona Aziz,Umair Rehman,Syed Ali Safi,Amir Zaib Abbasi
2024-09-02
Abstract:The rapid advancements in AI technologies have revolutionized the production of graphical content across various sectors, including entertainment, advertising, and e-commerce. These developments have spurred the need for robust evaluation methods to assess the quality and realism of AI-generated images. To address this, we conducted three studies. First, we introduced and validated a questionnaire called Visual Verity, which measures photorealism, image quality, and text-image alignment. Second, we applied this questionnaire to assess images from AI models (DALL-E2, DALL-E3, GLIDE, Stable Diffusion) and camera-generated images, revealing that camera-generated images excelled in photorealism and text-image alignment, while AI models led in image quality. We also analyzed statistical properties, finding that camera-generated images scored lower in hue, saturation, and brightness. Third, we evaluated computational metrics' alignment with human judgments, identifying MS-SSIM and CLIP as the most consistent with human assessments. Additionally, we proposed the Neural Feature Similarity Score (NFSS) for assessing image quality. Our findings highlight the need for refining computational metrics to better capture human visual perception, thereby enhancing AI-generated content evaluation.
Human-Computer Interaction,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the following key issues: 1. **Development and Validation of Evaluation Metrics**: The paper designs three subjective questionnaires to assess the key dimensions of AI-generated images, including photorealism, image quality, and text-image alignment. These questionnaires are statistically validated to ensure reliability and validity. 2. **Comparison of Pixel-Level and Model-Level Metrics**: The study compares pixel-level metrics (such as SSIM, PSNR) and model-level metrics (such as FID, LPIPS, CLIP, etc.), and analyzes the consistency of these metrics with human perception. 3. **Design of Neural Feature Similarity Score (NFSS)**: A new metric, NFSS, is proposed to evaluate image quality, aiming to better align with human judgment. 4. **Expert Evaluation Strategy**: The Interpolative Binning Scale (IBS) is introduced to fairly evaluate the results of human and metric outputs. 5. **Performance Comparison of Different AI Models**: By comparing the quality of images generated by models such as DALL-E2, DALL-E3, GLIDE, and Stable Diffusion, the study examines the performance differences of these models across different dimensions. In summary, the main goal of this paper is to provide a comprehensive and statistically validated approach to evaluate the photorealism, text-image alignment, and image quality of AI-generated images, ensuring that these evaluation methods better reflect human perception.