Abstract:Blind video quality assessment (BVQA) techniques try to assess the perceived quality of a degraded video with no prior knowledge of the reference. Deep learning-based techniques have been used in different approaches so far. These methods frequently pool frame-level features to create a video representation and assess quality. The features are conventionally taken from the final convolutional layers of the network, or the mid-layers at times. Regardless of the details and information about the frames' appearance, such approaches generally assume that degradations affect the high-level features and general patterns taken from the last layers. The methods mentioned above mainly have to utilize ensemble techniques because of the relatively poor correlation between video quality and such features. We introduce a novel method in this study to acquire frame-level deep features for assessing the quality of videos. To accomplish this, we look at the deep feature maps correlations of specific layers of a pre-trained network, or more specifically, their similarities as helpful features for assessing video quality. The covariance matrix i.e. the Gram matrix, which depicts the correlation between all feature maps of a specific mid-layer, can be stated as deep feature relationships. The structural details of each frame's texture and color, in other words, frame's appearance, are reflected in these relations and significantly correlate with the perceived quality of a given video. In fact, the extracted feature maps relations in different granularities can effectively illustrate the influence of various distortions. The experimental results on three UGC video quality benchmarks, including YouTube-UGC, KoNViD-1k, and LIVE-VQC individual datasets depict acceptable results. As one can see, the resultant SROCCs using the proposed features extracted from the EfficientNet B4 network, show improvements of around 10%, 10%, and 7%, on YouTube-UGC, KoNViD-1k, and LIVE-VQC respectively, compared to typical features using last convolutional layers (avgpool). Moreover, the average SROCC results in 4 out of 6 cross-dataset tests is around 0.22% higher compared to the state-of-the-art where the SVR is trained on YouTube-UGC or KoNViD-1k. Thus, employing feature maps correlation of mid-layers of a pre-trained network as frame-level feature provides better cross-dataset results using the proposed computationally efficient method. The implementation of our method is available at https://github.com/amirh-bakhtiari/FMC-VQA.

Video Quality Assessment Using Neural Network Based On Multifeature Extraction

Human Visual Perception Based Image Quality Assessment for Video Prediction

Objective Quality Assessment of Retargeted Images Based on RBF Neural Network with Structural Distortion and Content Change

User-generated Video Quality Assessment: A Subjective and Objective Study

Hyperspectral Image Quality Evaluation Based On Multi-Model Fusion

A Method of Video Quality Assessment Based on the Sensitive Region.

Video quality assessment with dense features and ranking pooling

Multi-Frame Quality Enhancement for Compressed Video

Convolutional Neural Networks for Video Quality Assessment

Non-Reference Quality Monitoring of Digital Images using Gradient Statistics and Feedforward Neural Networks

Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment

Perceptual Quality Enhancement with Multi-scale Deep Learning for Video Transmission: A QoE Perspective

Feature Maps Correlation-based Video Quality Assessment

FQA-Net: an Efficient Neural Network for Blind Image Quality Assessment

BP-based estimate on network video QoE

Video Quality Assessment: A Comprehensive Survey

Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

Multi-feature 360 Video Quality Estimation

Quality assessment of perceptual color video based on a top-down framework and quaternion

RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content

A deep learning approach for quality enhancement of surveillance video