An End-to-End No-Reference Video Quality Assessment Method with Hierarchical Spatiotemporal Feature Representation

Wenhao Shen,Mingliang Zhou,Xingran Liao,Weijia Jia,Tao Xiang,Bin Fang,Zhaowei Shang
DOI: https://doi.org/10.1109/tbc.2022.3164332
IF: 4.5
2022-01-01
IEEE Transactions on Broadcasting
Abstract:In this paper, we propose a deep neural network-based no-reference (NR) video quality assessment (VQA) method with spatiotemporal feature fusion and hierarchical information integration to evaluate the perceptual quality of videos. First, a feature extraction model is proposed by using 2D and 3D convolutional layers to gradually extract spatiotemporal features from raw video clips. Second, we design a hierarchical branching network to fuse multiframe features, and the feature vectors at each hierarchical level are comprehensively considered during the process of network optimization. Finally, these two modules and quality regression are synthesized into an end-to-end architecture. Experimental results obtained on benchmark VQA databases demonstrate the superiority of our method over other state-of-the-art algorithms. The source code is available online.1
What problem does this paper attempt to address?