Abstract:The quality of videos is the primary concern of video service providers. Built upon deep neural networks, video quality assessment (VQA) has rapidly progressed. Although existing works have introduced the knowledge of the human visual system (HVS) into VQA, there are still some limitations that hinder the full exploitation of HVS, including incomplete modeling with few HVS characteristics and insufficient connection among these characteristics. In this article, we present a novel spatial-temporal VQA method termed HVS-5M, wherein we design five modules to simulate five characteristics of HVS and create a bioinspired connection among these modules in a cooperative manner. Specifically, on the side of the spatial domain, the visual saliency module first extracts a saliency map. Then, the content-dependency and the edge masking modules extract the content and edge features, respectively, which are both weighted by the saliency map to highlight those regions that human beings may be interested in. On the other side of the temporal domain, the motion perception module extracts the dynamic temporal features. Besides, the temporal hysteresis module simulates the memory mechanism of human beings and comprehensively evaluates the video quality according to the fusion features from the spatial and temporal domains. Extensive experiments show that our HVS-5M outperforms the state-of-the-art VQA methods. Ablation studies are further conducted to verify the effectiveness of each module toward the proposed method. The source code is available at https://github.com/GZHU-DVL/HVS-5M.

Visual Attention Modeling for Video Quality Assessment with Structural Similarity.

Learning Stereoscopic Visual Attention Model for 3d Video

Spatio-temporal salience based video quality assessment

Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling.

A Spatiotemporal Weighted Dissimilarity-Based Method for Video Saliency Detection

Visual Saliency and Distortion Weighting Based Video Quality Assessment

A Method of Video Quality Assessment Based on the Sensitive Region.

Stereoscopic Video Quality Assessment Based on Visual Attention and Just-Noticeable Difference Models.

TSMSAN: A Three-Stream Multi-Scale Attentive Network for Video Saliency Detection.

Using Spatial‐Temporal Attention for Video Quality Evaluation

Spatiotemporal Saliency Detection based Video Quality Assessment.

Eye Fixation Assisted Video Saliency Detection Via Total Variation-based Pairwise Interaction.

A Spatial-Frequency-temporal Domain Based Saliency Model for Low Contrast Video Sequences

Video Quality Assessment Combining with Human Visual Gaze Characteristics

A visual attention model for video based on non- Negative matrix factorization sparseness on parts

Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model.

A Spatial-Temporal Video Quality Assessment Method via Comprehensive HVS Simulation

Saliency Inspired Quality Assessment of Stereoscopic 3D Video

Human Vision Attention Mechanism-Inspired Temporal-Spatial Feature Pyramid for Video Saliency Detection

A Visual-Attention Model Using Earth Mover's Distance-Based Saliency Measurement and Nonlinear Feature Combination

Revisiting Video Saliency: A Large-scale Benchmark and a New Model