Integrates Spatiotemporal Visual Stimuli for Video Quality Assessment

Wenzhong Guo,Kairui Zhang,Xiao Ke
DOI: https://doi.org/10.1109/tbc.2023.3312932
IF: 4.5
2024-01-01
IEEE Transactions on Broadcasting
Abstract:While feature extraction employing pre-trained models proves effective and efficient for no-reference video tasks, it falls short of adequately accounting for the intricacies of the Human Visual System (HVS). In this study, we proposed a novel approach to I ntegration of spatio-temporal V isual S timuli into V ideo Q uality A ssessment (IVS-VQA) for the inaugural time. Exploiting the heightened sensitivity of optic rod cells to edges and motion, along with the capability to track motion via conjugate gaze, our approach affords a distinctive perspective on video quality assessment. To capture significant changes at each timestamp, we incorporate edge information to enhance the feature extraction of the pre-trained model. To tackle pronounced motion across the timeline, we introduce an interactive temporal disparity query employing a dual-branch transformer architecture. This approach adeptly introduces feature biases and extracts comprehensive global attention, culminating in enhanced emphasis on non-continuous segments within the video. Additionally, we integrate low-level color texture information within the temporal domain to comprehensively capture distortions spanning various scales, both higher and lower. Empirical results illustrate that the proposed model attains state-of-the-art performance across all six benchmark databases, along with their corresponding weighted averages.
What problem does this paper attempt to address?