Deep Neural Networks for End-to-End Spatiotemporal Video Quality Prediction and Aggregation

Junming Chen,Haiqiang Wang,Munan Xu,Ge Li,Shan Liu
DOI: https://doi.org/10.1109/icme51207.2021.9428209
2021-01-01
Abstract:We propose an end-to-end deep neural network-based approach for full-reference video quality assessment (VQA). Many VQA methods predict local quality first and then apply a pooling mechanism to get a global score. However, these two steps are mostly independent of each other thus could not consider spatial and temporal information of a video simultaneously. The proposed method combines local quality prediction and aggregation together with a unified network that is trained in an end-to-end manner. To be specific, we predict the quality of local spatiotemporal segments with an attention-based convolutional neural network. Furthermore, we propose a spatiotemporal aggregation network (STAN) to allocate adaptive weights to localized quality scores. The aggregation network adopts Convolutional 3D (C3D) kernels to provide a better representation to characterize the quality of videos. Experiment results demonstrate that our method achieves superior performance in comparison with state-of-the-art methods on commonly used video quality datasets.
What problem does this paper attempt to address?