Abstract:With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-world applications. Second, their performance is sub-optimal due to limitations in dataset coverage, scale, and quality. To overcome these challenges, we introduce Depicted image Quality Assessment in the Wild (DepictQA-Wild). Our method includes a multi-functional IQA task paradigm that encompasses both assessment and comparison tasks, brief and detailed responses, full-reference and non-reference scenarios. We introduce a ground-truth-informed dataset construction approach to enhance data quality, and scale up the dataset to 495K under the brief-detail joint framework. Consequently, we construct a comprehensive, large-scale, and high-quality dataset, named DQ-495K. We also retain image resolution during training to better handle resolution-related quality issues, and estimate a confidence score that is helpful to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in distortion identification, instant rating, and reasoning tasks. Our advantages are further confirmed by real-world applications including assessing the web-downloaded images and ranking model-processed images. Datasets and codes will be released in <a class="link-external link-https" href="https://depictqa.github.io/depictqa-wild/" rel="external noopener nofollow">this https URL</a>.

Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment

Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild

Improving IQA Performance Based on Deep Mutual Learning.

A Feature-Enriched Completely Blind Image Quality Evaluator

2AFC Prompting of Large Multimodal Models for Image Quality Assessment

MD-IQA: Learning Multi-scale Distributed Image Quality Assessment with Semi Supervised Learning for Low Dose CT

Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

DSMix: Distortion-Induced Sensitivity Map Based Pre-training for No-Reference Image Quality Assessment

Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment

Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

No Training Blind Image Quality Assessment.

GMC-IQA: Exploiting Global-correlation and Mean-opinion Consistency for No-reference Image Quality Assessment

UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

No-Reference Image Quality Assessment: Obtain MOS from Image Quality Score Distribution

LG-IQA: Integration of Local and Global Features for No-Reference Image Quality Assessment*

Image Quality Assessment Based on Local Linear Information and Distortion-Specific Compensation

A survey on IQA

dipIQ: Blind Image Quality Assessment by Learning-to-Rank Discriminable Image Pairs

MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-Reference Image Quality Assessment Using Knowledge Distillation

Descriptive Image Quality Assessment in the Wild

Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement