G-Refine: A General Quality Refiner for Text-to-Image Generation

Chunyi Li,Haoning Wu,Hongkun Hao,Zicheng Zhang,Tengchaun Kou,Chaofeng Chen,Lei Bai,Xiaohong Liu,Weisi Lin,Guangtao Zhai
2024-04-29
Abstract:With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising the integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Based on the mechanisms of the Human Visual System (HVS) and syntax trees, the first two indicators can respectively identify the perception and alignment deficiencies, and the last module can apply targeted quality enhancement accordingly. Extensive experimentation reveals that when compared to alternative optimization methods, AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases. This improvement significantly contributes to the practical application of contemporary T2I models, paving the way for their broader adoption. The code will be released on
Multimedia,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of image quality defects generated by Text-to-Image (T2I) models. Specifically, existing T2I models cannot always guarantee high-quality results in terms of perceptual quality and alignment quality. To mitigate this limitation, the authors propose a general image quality optimizer called G-Refine. G-Refine consists of three interrelated modules: 1. **Perceptual Quality Metric (PQ-Map)**: By adjusting the image and text encoders of the CLIP model, this module obtains a perceptual quality weight map of the image. It can identify low-quality areas in the image and output a 2D perceptual quality map. 2. **Alignment Quality Metric (AQ-Map)**: By performing syntactic analysis on the prompt, constructing a syntax tree, and evaluating the alignment of each phrase with the generated image, this module ultimately merges this information to obtain an alignment quality map for the entire prompt. 3. **General Quality Enhancement Module (Quality Refiner)**: Based on the perceptual and alignment quality maps provided by the first two modules, this module applies targeted quality enhancement techniques to optimize low-quality areas without compromising high-quality areas. Experimental results show that images optimized by G-Refine outperform other methods across multiple databases and quality metrics, significantly improving the overall quality of the images and contributing to the broader adoption of T2I models in practical applications.