Abstract:<p>Benefiting with the rapid development of communication networks, effective video quality assessment (VQA) models which provide guidance for video transmission and compression technologies are highly demanded. This paper proposes a general-purpose full-reference VQA method combining DenseNet with spatial pyramid pooling and RankNet to not only extract high-level distortion representation and global spatial information of samples but also characterize the temporal correlation among frames. Firstly, the pretrained DenseNet is modified and finetuned to extract high-level features of distorBenefiting with the rapid development of communication networks, effective video quality assessment (VQA) models which provide guidance for video transmission and compression technologies are highly demanded. This paper proposes a general-purpose full-reference VQA method combining DenseNet with spatial pyramid pooling and RankNet to not only extract high-level distortion representation and global spatial information of samples but also characterize the temporal correlation among frames. Firstly, the pretrained DenseNet is modified and finetuned to extract high-level features of distorted videos. Then, spatial pyramid pooling is equipped in the DenseNet module to process flexible inputs with arbitrary spatial resolution. Thus, this kind of input which has the same spatial resolution as the original distorted video is processed by the well-trained DenseNet to generate frame-level quality, which considers the global spatial information of videos directly. Finally, learning to rank is introduced to explore the high-level temporal correlation of distorted videos by taking the RankNet as the temporal pooling function. The experimental results on two public VQA databases show that the proposed algorithm performs consistently with human visual perception.</p>

No-reference Video Quality Assessment Based on Spatio-temporal Perception Feature Fusion

Human Visual Perception Based Image Quality Assessment for Video Prediction

Learning Generalized Spatial-Temporal Deep Feature Representation for No-Reference Video Quality Assessment

No-Reference Quality Assessment of In-Capture Distorted Videos

Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment

Deep Neural Networks for End-to-End Spatiotemporal Video Quality Prediction and Aggregation

SpatioTemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment

Video quality assessment with dense features and ranking pooling

Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment

Using Spatial‐Temporal Attention for Video Quality Evaluation

No-Reference Multi-Level Video Quality Assessment Metric for 3D-Synthesized Videos

Blind Video Quality Assessment for Ultra-High-Definition Video Based on Super-Resolution and Deep Reinforcement Learning

Perceptual Quality Assessment for Video Frame Interpolation

A Spatial-Temporal Video Quality Assessment Method via Comprehensive HVS Simulation

Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine Strategy

C3DVQA: Full-Reference Video Quality Assessment with 3D Convolutional Neural Network

Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment

One Transform To Compute Them All: Efficient Fusion-Based Full-Reference Video Quality Assessment

Visual Quality Assessment for Web Videos

Video Quality Assessment Based on Swin Transformer with Spatio-Temporal Feature Fusion and Data Augmentation

HVS Revisited: A Comprehensive Video Quality Assessment Framework