Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning

Jiarui Wang,Huiyu Duan,Guangtao Zhai,Xiongkuo Min
DOI: https://doi.org/10.48550/arxiv.2405.07346
2024-01-01
Abstract:Artificial Intelligence Generated Content (AIGC) has grown rapidly in recentyears, among which AI-based image generation has gained widespread attentiondue to its efficient and imaginative image creation ability. However,AI-generated Images (AIGIs) may not satisfy human preferences due to theirunique distortions, which highlights the necessity to understand and evaluatehuman preferences for AIGIs. To this end, in this paper, we first establish anovel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+,which provides human visual preference scores and detailed preferenceexplanations from three perspectives including quality, authenticity, andcorrespondence. Then, based on the constructed AIGCIQA2023+ database, thispaper presents a MINT-IQA model to evaluate and explain human preferences forAIGIs from Multi-perspectives with INstruction Tuning. Specifically, theMINT-IQA model first learn and evaluate human preferences for AI-generatedImages from multi-perspectives, then via the vision-language instruction tuningstrategy, MINT-IQA attains powerful understanding and explanation ability forhuman visual preference on AIGIs, which can be used for feedback to furtherimprove the assessment capabilities. Extensive experimental results demonstratethat the proposed MINT-IQA model achieves state-of-the-art performance inunderstanding and evaluating human visual preferences for AIGIs, and theproposed model also achieves competing results on traditional IQA taskscompared with state-of-the-art IQA models. The AIGCIQA2023+ database andMINT-IQA model will be released to facilitate future research.
What problem does this paper attempt to address?