A No-Reference Video Quality Assessment Method with Bidirectional Hierarchical Semantic Representation

Longbin Mo,Haibing Yin,Hongkui Wang,Xiaofeng Huang,Jucai Lin,Yaguang Xie,YiChen Liu,Ning Sheng,Xianghong Tang
DOI: https://doi.org/10.1016/j.sigpro.2024.109819
IF: 4.729
2024-01-01
Signal Processing
Abstract:Perceived quality assessment for user-generated content (UGC) videos is of considerable importance to safeguard the viewing experience of end-users. The diversity of content and the blend of authentic distortions pose great challenges for UGC video quality assessment (UGC-VQA). The reverse hierarchy theory suggests that there is bottom-up feedforward perception and top-down feedback perception in the human visual system (HVS). However, existing UGC-VQA methods rarely consider feedback perception and make it difficult to model the complete visual perception loop, leading to inaccurate prediction of perceived quality. Thus, this paper innovatively proposes a bidirectional hierarchical semantic extraction structure for VQA (BHSE-VQA), which simulates visual feedforward and feedback perception. Specifically, a feedforward and feedback multi-level network is first designed to extract multi-level spatio-temporal features with a 3D-ConvNext backbone in the feedforward pathway and process these hierarchical features with the combination of channel attention and spatial attention mechanisms in the feedback pathway. Then, considering the varying impacts of responses at different perception layers on visual perception results, the weights of features at each level are redistributed to be consistent with human perception. With bidirectional hierarchical pathway features, a temporal attention fusion network is introduced to further capture temporal correlations and aggregate features relying on the residuals between feedforward and feedback features for quality prediction. Experimental results on several representative UGC-VQA databases demonstrate the effectiveness of the proposed model and the significance of comprehensive hierarchical perception modeling for UGC-VQA.
What problem does this paper attempt to address?