Advancing Video Quality Assessment for AIGC

Xinli Yue,Jianhui Sun,Han Kong,Liangchao Yao,Tianyi Wang,Lei Li,Fengyun Rao,Jing Lv,Fan Xia,Yuetang Deng,Qian Wang,Lingchen Zhao

2024-09-23

Abstract:In recent years, AI generative models have made remarkable progress across various domains, including text generation, image generation, and video generation. However, assessing the quality of text-to-video generation is still in its infancy, and existing evaluation frameworks fall short when compared to those for natural videos. Current video quality assessment (VQA) methods primarily focus on evaluating the overall quality of natural videos and fail to adequately account for the substantial quality discrepancies between frames in generated videos. To address this issue, we propose a novel loss function that combines mean absolute error with cross-entropy loss to mitigate inter-frame quality inconsistencies. Additionally, we introduce the innovative S2CNet technique to retain critical content, while leveraging adversarial training to enhance the model's generalization capabilities. Experimental results demonstrate that our method outperforms existing VQA techniques on the AIGC Video dataset, surpassing the previous state-of-the-art by 3.1% in terms of PLCC.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issue of quality assessment for generated videos, particularly text-to-video generation. Specifically, existing video quality assessment (VQA) methods mainly focus on the overall quality assessment of natural videos, but are insufficient in handling inter-frame quality differences in generated videos. To solve this problem, the authors propose the following innovations: 1. **Frame Consistency Loss (FCL)**: Combines Mean Absolute Error (MAE) loss and Binary Cross-Entropy (BCE) loss to alleviate the issue of inter-frame quality inconsistency in generated videos. 2. **S2CNet Technology**: Introduces a content-aware cropping algorithm that retains key content areas, thereby capturing richer and more comprehensive features. 3. **Adversarial Training**: Explores the application of adversarial training in video quality assessment tasks, enhancing the model's generalization ability by introducing adversarial perturbations. Experimental results show that this method outperforms existing techniques on the AIGC video dataset, improving the PLCC metric by 3.1%, and achieved second place in the NTRIE 2024 S-UGC VQA Challenge, demonstrating its effectiveness across different video types.

Advancing Video Quality Assessment for AIGC

AIGC-VQA: A Holistic Perception Metric for AIGC Video Quality Assessment

Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition

TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment

AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

A Survey of AI-Generated Video Evaluation

AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

XGC-VQA: A unified video quality assessment model for User, Professionally, and Occupationally-Generated Content

CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP

Video Quality Assessment: A Comprehensive Survey

A Perceptual Quality Assessment Exploration for AIGC Images

Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

SB-VQA: A Stack-Based Video Quality Assessment Framework for Video Enhancement

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

GAIA: Rethinking Action Quality Assessment for AI-Generated Videos

Video Transformer based Video Quality Assessment with Spatiotemporally adaptive Token Selection and Assembly