Advancing Video Quality Assessment for AIGC

Xinli Yue,Jianhui Sun,Han Kong,Liangchao Yao,Tianyi Wang,Lei Li,Fengyun Rao,Jing Lv,Fan Xia,Yuetang Deng,Qian Wang,Lingchen Zhao
2024-09-23
Abstract:In recent years, AI generative models have made remarkable progress across various domains, including text generation, image generation, and video generation. However, assessing the quality of text-to-video generation is still in its infancy, and existing evaluation frameworks fall short when compared to those for natural videos. Current video quality assessment (VQA) methods primarily focus on evaluating the overall quality of natural videos and fail to adequately account for the substantial quality discrepancies between frames in generated videos. To address this issue, we propose a novel loss function that combines mean absolute error with cross-entropy loss to mitigate inter-frame quality inconsistencies. Additionally, we introduce the innovative S2CNet technique to retain critical content, while leveraging adversarial training to enhance the model's generalization capabilities. Experimental results demonstrate that our method outperforms existing VQA techniques on the AIGC Video dataset, surpassing the previous state-of-the-art by 3.1% in terms of PLCC.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of quality assessment for generated videos, particularly text-to-video generation. Specifically, existing video quality assessment (VQA) methods mainly focus on the overall quality assessment of natural videos, but are insufficient in handling inter-frame quality differences in generated videos. To solve this problem, the authors propose the following innovations: 1. **Frame Consistency Loss (FCL)**: Combines Mean Absolute Error (MAE) loss and Binary Cross-Entropy (BCE) loss to alleviate the issue of inter-frame quality inconsistency in generated videos. 2. **S2CNet Technology**: Introduces a content-aware cropping algorithm that retains key content areas, thereby capturing richer and more comprehensive features. 3. **Adversarial Training**: Explores the application of adversarial training in video quality assessment tasks, enhancing the model's generalization ability by introducing adversarial perturbations. Experimental results show that this method outperforms existing techniques on the AIGC video dataset, improving the PLCC metric by 3.1%, and achieved second place in the NTRIE 2024 S-UGC VQA Challenge, demonstrating its effectiveness across different video types.