End-to-End Blind Video Quality Assessment Based on Visual and Memory Attention Modeling

Xiaodi Guan,Fan Li,Yangfan Zhang,Pamela C. Cosman
DOI: https://doi.org/10.1109/tmm.2022.3189251
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Developing an objective quality assessment model for user-generated content (UGC) videos is significant for multimedia applications, and also a challenge due to the diversity of video content and unpredictability of distortions. To predict the perceived quality, it is necessary to consider the human visual system, in which attention in visual and memory domains is an essential component. With the idea that the stimulus-driven bottom-up mechanism and cognition-driven top-down mechanism work in synergy to generate quality-aware attention, we propose an end-to-end blind video quality assessment (VQA) algorithm based on visual and memory attention modeling. First, a quality-aware visual attention module is established to obtain spatial-temporal attention-guided representations for frame-level quality perception. Specifically, an attention selection and confluence method is developed by circularly integrating the quality-aware attention information to spatial-temporal content features. Then, with the aid of a quality-aware memory attention module, the video-level attention-guided features are inferred through the dimension and attention reshaping of frame-level representations. The video quality is predicted with the guidance of frame-level visual attention and video-level memory attention in an end-to-end structure. Experimental results on five UGC-VQA databases (CVD2014, LIVE-Qualcomm, KoNViD-1 k, LIVE-VQC and Youtube-UGC) demonstrate the effectiveness of our modules.
What problem does this paper attempt to address?