Video Quality Assessment for Spatio-Temporal Resolution Adaptive Coding

Hanwei Zhu,Baoliang Chen,Lingyu Zhu,Peilin Chen,Linqi Song,Shiqi Wang
DOI: https://doi.org/10.1109/tcsvt.2024.3367904
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Spatio-temporal resolution adaptive (STRA) coding has been repeatedly proven to be a promising way to improve coding efficiency and reduce coding complexity. The wide consensus is that the optimal subsampled resolution and frame rate should be governed by so-called generalized rate-distortion performance based on the ultimately perceived distortion. However, it is non-trivial to accurately predict the quality of reconstructed videos due to the fact that the distortion originates from both subsampling and compression. To address this issue, we propose a novel video quality assessment model that is fully aware of the information available in downsampled videos for compression, such as resolution and frame rate. More specifically, the proposed model relies on quality-aware spatial features that are extracted by an image quality fine-tuned backbone. Subsequently, the spatio-temporal quality is modeled based on the transformer encoder, which is adaptive to the downsampling spatial and temporal resolutions. This enables the transformer encoder to produce discriminative features that capture long-range temporal dependencies related to the current context. The quality score, which is the output of the transformer encoder, thus reflects both the influence of the subsampling and compression. We conduct extensive experiments that demonstrate the superiority of the proposed model over state-of-the-art methods on four subsampling and compression video quality datasets. Furthermore, we apply the proposed model to bitrate ladder optimization, leading to a perceptual-aware spatial and temporal downsampling strategy that yields promising bitrate savings. The source codes of the proposed model will be publicly available at https://github.com/h4nwei/STRA-VQA.
What problem does this paper attempt to address?