X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

Yixiong Chen,Li Liu,Chris Ding

2023-05-26

Abstract:This paper introduces a novel explainable image quality evaluation approach called X-IQE, which leverages visual large language models (LLMs) to evaluate text-to-image generation methods by generating textual explanations. X-IQE utilizes a hierarchical Chain of Thought (CoT) to enable MiniGPT-4 to produce self-consistent, unbiased texts that are highly correlated with human evaluation. It offers several advantages, including the ability to distinguish between real and generated images, evaluate text-image alignment, and assess image aesthetics without requiring model training or fine-tuning. X-IQE is more cost-effective and efficient compared to human evaluation, while significantly enhancing the transparency and explainability of deep image quality evaluation models. We validate the effectiveness of our method as a benchmark using images generated by prevalent diffusion models. X-IQE demonstrates similar performance to state-of-the-art (SOTA) evaluation methods on COCO Caption, while overcoming the limitations of previous evaluation models on DrawBench, particularly in handling ambiguous generation prompts and text recognition in generated images. Project website: <a class="link-external link-https" href="https://github.com/Schuture/Benchmarking-Awesome-Diffusion-Models" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The paper aims to address several key issues in image quality assessment. Specifically: 1. **Limitations of existing methods**: Current manual assessment methods are costly and have poor reproducibility; model-based assessment methods require complex models and specially annotated data, and lack the generalization ability of humans. 2. **Interpretability and transparency**: Existing model assessment methods often focus only on predicting image quality scores, making it difficult to explain biases and defects in the training data, leading to poor model performance. The paper proposes a new interpretable image quality assessment method, X-IQE, which uses pre-trained visual large language models (such as MiniGPT-4) to generate image analysis text, thereby achieving a comprehensive assessment of image quality. X-IQE has the following advantages: - **Interpretability**: Generates descriptions of the reasoning process through Chain of Thought (CoT). - **Comprehensiveness**: Designed prompts can conduct multi-faceted assessments, not limited to specific features. - **Strong performance**: Utilizes the powerful generalization ability of large-scale language models. - **Unbiasedness**: Eliminates biases introduced by dataset annotations through objective prompt texts. - **No training required**: Leverages the capabilities of pre-trained models without the need for additional data collection and training. Extensive experiments have validated the effectiveness of X-IQE on real and AI-generated images, demonstrating its potential as a benchmark for evaluating text-to-image generation models.

X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark

Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models

A Novel Evaluation Framework for Image2Text Generation

IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models

T2I-Scorer: Quantitative Evaluation on Text-to-Image Generation Via Fine-Tuned Large Multi-Modal Models

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

Evaluating Text-to-Visual Generation with Image-to-Text Generation

Visual question answering based evaluation metrics for text-to-image generation

XAI Benchmark for Visual Explanation

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

Interactive Visual Assessment for Text-to-Image Generation Models

Descriptive Image Quality Assessment in the Wild

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment

Holistic Evaluation of Text-To-Image Models

Holistic Evaluation for Interleaved Text-and-Image Generation

Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild