Abstract:No-reference (NR) video quality assessment (VQA) is a challenging problem due to the difficulty in model training caused by insufficient annotation samples. Previous work commonly utilizes transfer learning to directly migrate pre-trained models on the image database, which suffers from domain inadaptation. Recently, self-supervised representation learning has become a hot spot for the independence of large-scale labeled data. However, existing self-supervised representation learning method only considers the distortion types and contents of the video, there needs to investigate the intrinsic properties of videos for the VQA task. To amend this, here we propose a novel multi-task self-supervised representation learning framework to pre-train a video quality assessment model. Specifically, we consider the effects of distortion degrees, distortion types, and frame rates on the perceived quality of videos, and utilize them as guidance to generate self-supervised samples and labels. Then, we optimize the ability of the VQA model in capturing spatio-temporal differences between the original video and the distorted version using three pretext tasks. The resulting framework not only eases the requirements for the quality of the original video but also benefits from the self-supervised labels as well as the Siamese network. In addition, we propose a Transformer-based VQA model, where short-term spatio-temporal dependencies of videos are modeled by 3D-CNN and 2D-CNN, and then the long-term spatio-temporal dependencies are modeled by Transformer because of its excellent long-term modeling capability. We evaluated the proposed method on four public video quality assessment databases and found that it is competitive with all compared VQA algorithms.

No-Reference Video Quality Assessment Based on Ensemble of Knowledge and Data-Driven Models.

No-Reference Video Quality Assessment with Heterogeneous Knowledge Ensemble

Attention Based Network for No-Reference UGC Video Quality Assessment.

COME for No-Reference Video Quality Assessment

Surveillance Video Quality Assessment Based on Quality Related Retraining.

No-reference Video Quality Assessment Based on Perceptual Features Extracted from Multi-Directional Video Spatiotemporal Slices Images

An End-to-End No-Reference Video Quality Assessment Method with Hierarchical Spatiotemporal Feature Representation

Self-Supervised Representation Learning for Video Quality Assessment

No-Reference Quality Assessment of In-Capture Distorted Videos

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

RIRNet: Recurrent-In-Recurrent Network for Video Quality Assessment

Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy

A Content-Oriented No-Reference Perceptual Video Quality Assessment Method for Computer Graphics Animation Videos

MV-VVQA: Multi-View Learning for No-Reference Volumetric Video Quality Assessment

Reconstruction-Based No-Reference Video Quality Assessment

No-Reference Quality Assessment for Networked Video Via Primary Analysis of Bit Stream

Deep Learning Based Full-reference and No-reference Quality Assessment Models for Compressed UGC Videos

Video Quality Assessment: A Comprehensive Survey

No-reference screen content video quality assessment

Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment

Spatiotemporal Statistics for Video Quality Assessment