Abstract:Image Quality Assessment (IQA) with references plays an important role in optimizing and evaluating computer vision tasks. Traditional methods assume that all pixels of the reference and test images are fully aligned. Such Aligned-Reference IQA (AR-IQA) approaches fail to address many real-world problems with various geometric deformations between the two images. Although significant effort has been made to attack Geometrically-Disparate-Reference IQA (GDR-IQA) problem, it has been addressed in a task-dependent fashion, for example, by dedicated designs for image super-resolution and retargeting, or by assuming the geometric distortions to be small that can be countered by translation-robust filters or by explicit image registrations. Here we rethink this problem and propose a unified, non-training-based Deep Structural Similarity (DeepSSIM) approach to address the above problems in a single framework, which assesses structural similarity of deep features in a simple but efficient way and uses an attention calibration strategy to alleviate attention deviation. The proposed method, without application-specific design, achieves state-of-the-art performance on AR-IQA datasets and meanwhile shows strong robustness to various GDR-IQA test cases. Interestingly, our test also shows the effectiveness of DeepSSIM as an optimization tool for training image super-resolution, enhancement and restoration, implying an even wider generalizability. \footnote{Source code will be made public after the review is completed.
What problem does this paper attempt to address?
This paper attempts to solve an important problem in image quality assessment (IQA), that is, how to conduct accurate quality assessment when there are geometric differences between the reference image and the test image. Specifically, traditional AR - IQA (Aligned - Reference IQA) methods assume that the reference image and the test image are perfectly aligned, but in real - world applications, in many cases there are geometric deformations (such as rotation, scaling, shearing, etc.) between the two, which renders traditional methods ineffective.
To solve this problem, the author proposes a unified, non - training - dependent deep structural similarity (DeepSSIM) metric method, aiming at:
1. **Handling geometric differences**: By extracting deep features and calculating their autocorrelations to construct a deep - structure representation of the image, it can robustly handle geometric differences.
2. **No task - specific design required**: Provide a general framework that can be applied to multiple IQA tasks, such as super - resolution, redirection, and geometric transformation, without additional design for specific tasks.
3. **Efficient and superior performance**: Experiments show that this method not only achieves state - of - the - art performance on the AR - IQA dataset, but also shows strong robustness in GDR - IQA (Geometrically - Disparate - Reference IQA) tasks.
### Formula Summary
- Deep feature extraction:
\[
F(I)=\{m_c^{(5)}; c = 1,...,512\}
\]
where \(m_c^{(5)}\) represents the feature map extracted from the first convolutional layer (conv5_1) in the last stage of the VGG16 network.
- Deep - structure representation:
\[
RDS(I)=F(I)\cdot F(I)^T
\]
That is, use the Gram matrix to represent the deep structure of the image.
- DeepSSIM metric formula:
\[
Q_{\text{DeepSSIM}}=\frac{1}{k}\sum_{i = 1}^k\frac{2\sigma(k)_{RDS(X)RDS(Y)}+\xi}{(\sigma(k)_{RDS(X)})^2+(\sigma(k)_{RDS(Y)})^2+\xi}
\]
where \(\sigma(k)_{RDS(X)RDS(Y)}\) represents the covariance between \(RDS(X)(k)\) and \(RDS(Y)(k)\), \((\sigma(k)_{RDS(X)})^2\) and \((\sigma(k)_{RDS(Y)})^2\) represent their variances respectively, and \(\xi\) is a small constant to prevent the denominator from being zero.
### Main Contributions
1. Proposed a deep - structure representation method based on deep - feature autocorrelation. This representation method is positively correlated with subjective scores in AR - IQA and shows robustness to geometric deformations in GDR - IQA.
2. Constructed a unified, non - training - dependent DeepSSIM metric method to evaluate image quality by comparing the deep - structure representations of the reference image and the test image.
3. Introduced an attention calibration strategy to alleviate the attention bias problem when using pre - trained networks to extract IQA - oriented deep features.
4. Experimental results show that DeepSSIM achieves state - of - the - art evaluation performance in both AR - IQA and GDR - IQA tasks and can be used as an effective optimization tool for computer vision task training.
Through these improvements, this research provides new ideas and methods for solving the IQA problem where there are geometric differences between the reference image and the test image.