Spatial-Temporal Network for No Reference Video Quality Assessment Based on Saliency

Xueting Wang,Ping Shi,Da Pan
DOI: https://doi.org/10.1109/iccst50977.2020.00026
2020-01-01
Abstract:With the increasing use of digital video, the importance of establishing a high-performance no reference video quality assessment (NR VQA) model is increasing. How to effectively assess the properties of the human visual system (HVS) in a data-driven manner is one of the difficulties in NR VQA. In this paper, we propose a spatio-temporal network model based on saliency. The model has two branches: the spatio-temporal branch and the saliency branch. We propose a basic spatio-temporal network model in the spatio-temporal branch and use it to predict the score after the spatial distortion effected by temporal information. Then we extract the salient features of the current video frame and merge it with the spatial distortion to predict a result that is more in line with human perception. Finally, the two scores are automatically weighted to obtain the score of the current video. The performance of the proposed method has been verified on two databases, LIVE and CSIQ. The training results show that the method proposed in this paper can basically conform to the subjective perception of human eyes, and the performance of the network can also be better than most current methods without reference video.
What problem does this paper attempt to address?