STARVQA: SPACE-TIME ATTENTION FOR VIDEO QUALITY ASSESSMENT

Fengchuang Xing,Yuan-Gen Wang,Hanpin Wang,Leida Li,Guopu Zhu
DOI: https://doi.org/10.1109/icip46576.2022.9897881
2022-01-01
Abstract:Transformer based on self-attention mechanism is blooming in computer vision nowadays. However, its application to video quality assessment (VQA) has not been reported. Evaluating the quality of in-the-wild videos is challenging due to the unknown of pristine reference and shooting distortion. This paper presents a novel space-time attention network for the VQA problem, named StarVQA. StarVQA builds a Transformer by alternately concatenating the divided space-time attention. To adapt the Transformer architecture for training, StarVQA designs a vectorized regression loss by encoding the mean opinion score (MOS) to the probability vector and embedding a special vectorized label token as the learnable variable. To capture the long-range spatiotemporal dependencies of a video sequence, StarVQA encodes the space-time position information of each patch to the input of the Transformer. Various experiments are conducted on the de-facto in-the-wild video datasets, including LIVE-VQC, KoNViD-1k, LSVQ, and LSVQ-1080p. Experimental results demonstrate the superiority of StarVQA over the state-of-the-art. The source code is available at https://github.com/GZHU-DVL/StarVQA.
What problem does this paper attempt to address?