ExIQA: Explainable Image Quality Assessment Using Distortion Attributes

Sepehr Kazemi Ranjbar,Emad Fatemizadeh
2024-09-11
Abstract:Blind Image Quality Assessment (BIQA) aims to develop methods that estimate the quality scores of images in the absence of a reference image. In this paper, we approach BIQA from a distortion identification perspective, where our primary goal is to predict distortion types and strengths using Vision-Language Models (VLMs), such as CLIP, due to their extensive knowledge and generalizability. Based on these predicted distortions, we then estimate the quality score of the image. To achieve this, we propose an explainable approach for distortion identification based on attribute learning. Instead of prompting VLMs with the names of distortions, we prompt them with the attributes or effects of distortions and aggregate this information to infer the distortion strength. Additionally, we consider multiple distortions per image, making our method more scalable. To support this, we generate a dataset consisting of 100,000 images for efficient training. Finally, attribute probabilities are retrieved and fed into a regressor to predict the image quality score. The results show that our approach, besides its explainability and transparency, achieves state-of-the-art (SOTA) performance across multiple datasets in both PLCC and SRCC metrics. Moreover, the zero-shot results demonstrate the generalizability of the proposed approach.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are two key challenges in Blind Image Quality Assessment (BIQA): 1. How to accurately predict the quality score of an image without a reference image; 2. How to improve the interpretability and transparency of the model to ensure its reliability in key fields such as medical imaging. Specifically, the author proposes a new method to solve the following problems: 1. **Predicting image distortion types and intensities**: - Existing BIQA methods usually rely on predefined distortion names, which limit the extensibility and accuracy of the model. This paper proposes a method based on distortion attributes, using Vision - Language Models (VLMs) such as CLIP to identify the distortion types and their intensities in the image. - The author uses the visual effects or attributes of distortion (rather than specific distortion names) as text prompts, enabling the model to more reliably identify multiple types of distortion and be extended to unknown distortion types. 2. **Estimating image quality scores**: - After identifying the distortion types and intensities, the author uses this information to estimate the image quality score. To improve the transparency of the model, they only use the probabilities of distortion attributes as input features of the regressor, avoiding the influence of irrelevant features, thereby improving the interpretability and generalization ability of the model. 3. **Handling multi - distortion images**: - Existing datasets usually only contain images with a single distortion, which limits the performance of the model. For this reason, the author generates a dataset containing 100,000 multi - distortion images to support the training and evaluation of multi - distortion images. 4. **Achieving zero - sample performance**: - The author demonstrates the zero - sample performance of their method on unseen datasets, proving the strong generalization ability of the model. In summary, this paper aims to improve the accuracy and interpretability of blind image quality assessment by improving the distortion identification method, while also solving the shortcomings of existing methods in multi - distortion image processing and generalization ability.