G-Refine: A General Quality Refiner for Text-to-Image Generation

Chunyi Li,Haoning Wu,Hongkun Hao,Zicheng Zhang,Tengchaun Kou,Chaofeng Chen,Lei Bai,Xiaohong Liu,Weisi Lin,Guangtao Zhai

2024-04-29

Abstract:With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising the integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Based on the mechanisms of the Human Visual System (HVS) and syntax trees, the first two indicators can respectively identify the perception and alignment deficiencies, and the last module can apply targeted quality enhancement accordingly. Extensive experimentation reveals that when compared to alternative optimization methods, AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases. This improvement significantly contributes to the practical application of contemporary T2I models, paving the way for their broader adoption. The code will be released on

Multimedia,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issue of image quality defects generated by Text-to-Image (T2I) models. Specifically, existing T2I models cannot always guarantee high-quality results in terms of perceptual quality and alignment quality. To mitigate this limitation, the authors propose a general image quality optimizer called G-Refine. G-Refine consists of three interrelated modules: 1. **Perceptual Quality Metric (PQ-Map)**: By adjusting the image and text encoders of the CLIP model, this module obtains a perceptual quality weight map of the image. It can identify low-quality areas in the image and output a 2D perceptual quality map. 2. **Alignment Quality Metric (AQ-Map)**: By performing syntactic analysis on the prompt, constructing a syntax tree, and evaluating the alignment of each phrase with the generated image, this module ultimately merges this information to obtain an alignment quality map for the entire prompt. 3. **General Quality Enhancement Module (Quality Refiner)**: Based on the perceptual and alignment quality maps provided by the first two modules, this module applies targeted quality enhancement techniques to optimize low-quality areas without compromising high-quality areas. Experimental results show that images optimized by G-Refine outperform other methods across multiple databases and quality metrics, significantly improving the overall quality of the images and contributing to the broader adoption of T2I models in practical applications.

G-Refine: A General Quality Refiner for Text-to-Image Generation

Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment

AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

Learning a No-Reference Quality Assessment Model of Enhanced Images With Big Data

CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP

LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

AI-Generated Image Quality Assessment Based on Task-Specific Prompt and Multi-Granularity Similarity

Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap

No-reference image quality assessment based on global awareness

A Perceptual Quality Assessment Exploration for AIGC Images

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images

TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment

ReDiFine: Reusable Diffusion Finetuning for Mitigating Degradation in the Chain of Diffusion

RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images