Abstract:Blind Image Quality Assessment (BIQA) aims to develop methods that estimate the quality scores of images in the absence of a reference image. In this paper, we approach BIQA from a distortion identification perspective, where our primary goal is to predict distortion types and strengths using Vision-Language Models (VLMs), such as CLIP, due to their extensive knowledge and generalizability. Based on these predicted distortions, we then estimate the quality score of the image. To achieve this, we propose an explainable approach for distortion identification based on attribute learning. Instead of prompting VLMs with the names of distortions, we prompt them with the attributes or effects of distortions and aggregate this information to infer the distortion strength. Additionally, we consider multiple distortions per image, making our method more scalable. To support this, we generate a dataset consisting of 100,000 images for efficient training. Finally, attribute probabilities are retrieved and fed into a regressor to predict the image quality score. The results show that our approach, besides its explainability and transparency, achieves state-of-the-art (SOTA) performance across multiple datasets in both PLCC and SRCC metrics. Moreover, the zero-shot results demonstrate the generalizability of the proposed approach.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are two key challenges in Blind Image Quality Assessment (BIQA): 1. How to accurately predict the quality score of an image without a reference image; 2. How to improve the interpretability and transparency of the model to ensure its reliability in key fields such as medical imaging. Specifically, the author proposes a new method to solve the following problems: 1. **Predicting image distortion types and intensities**: - Existing BIQA methods usually rely on predefined distortion names, which limit the extensibility and accuracy of the model. This paper proposes a method based on distortion attributes, using Vision - Language Models (VLMs) such as CLIP to identify the distortion types and their intensities in the image. - The author uses the visual effects or attributes of distortion (rather than specific distortion names) as text prompts, enabling the model to more reliably identify multiple types of distortion and be extended to unknown distortion types. 2. **Estimating image quality scores**: - After identifying the distortion types and intensities, the author uses this information to estimate the image quality score. To improve the transparency of the model, they only use the probabilities of distortion attributes as input features of the regressor, avoiding the influence of irrelevant features, thereby improving the interpretability and generalization ability of the model. 3. **Handling multi - distortion images**: - Existing datasets usually only contain images with a single distortion, which limits the performance of the model. For this reason, the author generates a dataset containing 100,000 multi - distortion images to support the training and evaluation of multi - distortion images. 4. **Achieving zero - sample performance**: - The author demonstrates the zero - sample performance of their method on unseen datasets, proving the strong generalization ability of the model. In summary, this paper aims to improve the accuracy and interpretability of blind image quality assessment by improving the distortion identification method, while also solving the shortcomings of existing methods in multi - distortion image processing and generalization ability.

ExIQA: Explainable Image Quality Assessment Using Distortion Attributes

LIQA: Lifelong Blind Image Quality Assessment

Blind Image Quality Estimation Via Distortion Aggravation.

Blind Quality Assessment Based on Pseudo-Reference Image

Learning To Blindly Assess Image Quality In The Laboratory And Wild

GraphIQA: Learning Distortion Graph Representations for Blind Image Quality Assessment

Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment

Blind Image Quality Assessment Via Cross-View Consistency

SPIQ: A Self-Supervised Pre-Trained Model for Image Quality Assessment

Uncertainty-Aware Blind Image Quality Assessment in the Laboratory and Wild.

From Distortion Manifold to Perceptual Quality: a Data Efficient Blind Image Quality Assessment Approach

Regression-free Blind Image Quality Assessment with Content-Distortion Consistency

Blind Image Quality Assessment Based on Multichannel Feature Fusion and Label Transfer.

Learning without Human Scores for Blind Image Quality Assessment

Blind image quality assessment using Beltrami filter-based contrast features (BF-bCF) & LSTM network

Blind Image Quality Assessment Via Vision-Language Correspondence: A Multitask Learning Perspective.

Making a “Completely Blind” Image Quality Analyzer

Learn to Evaluate Image Perceptual Quality Blindly from Statistics of Self-similarity

Learning to Rank for Blind Image Quality Assessment

Referenceless Quality Assessment For Contrast Distorted Image Using Hybrid Features

ARNIQA: Learning Distortion Manifold for Image Quality Assessment