Abstract:Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus rendering them inapplicable in AIGVs. To address these problems, we construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective, resulting in 971,244 ratings among 9,180 video-action pairs. Based on GAIA, we evaluate a suite of popular text-to-video (T2V) models on their ability to generate visually rational actions, revealing their pros and cons on different categories of actions. We also extend GAIA as a testbed to benchmark the AQA capacity of existing automatic evaluation methods. Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods perform poorly with an average SRCC of 0.454, 0.191, and 0.519, respectively, indicating a sizable gap between current models and human action perception patterns in AIGVs. Our findings underscore the significance of action quality as a unique perspective for studying AIGVs and can catalyze progress towards methods with enhanced capacities for AQA in AIGVs.

What problem does this paper attempt to address?

The paper attempts to address the issue that existing Action Quality Assessment (AQA) methods perform poorly when evaluating the quality of actions in AI-generated videos (AIGV). Specifically: 1. **Limitations of Existing AQA Datasets**: - Existing AQA datasets mainly focus on actions in real videos from specific domains such as sports and fitness, and the collected scores are primarily coarse-grained professional ratings, lacking consideration for the diversity of different scenarios. - The content of these datasets often has little variation because the action subjects usually perform similar actions in consistent environments, lacking scene diversity. 2. **Shortcomings of Existing AQA Methods**: - Existing AQA methods are mainly based on pose or visual feature extraction, aggregation, and score regression. These methods typically use powerful 3D backbone networks for pre-training to achieve better feature transferability. - However, generated videos may contain atypical actions, such as abnormal limb counts, illogical object shapes, and physically impossible movements, making models learned from real videos perform poorly in AIGV. 3. **Special Challenges of AI-Generated Videos**: - There are fundamental differences between generated videos and real videos, making it more difficult to evaluate the quality of actions in generated videos. - With the exponential growth of text-to-video (T2V) models, the challenge of evaluating video action quality has become more severe, requiring reliable solutions. To address these issues, the paper proposes GAIA (Generic AI-generated Action dataset), a large-scale subjective evaluation dataset that assesses the quality of actions in AI-generated videos from a causal inference perspective. The GAIA dataset includes 9,180 videos with a total of 971,244 human ratings, covering a variety of full-body, hand, and facial actions. Using this dataset, the paper evaluates the ability of 18 popular T2V models to generate visually reasonable actions and reveals their strengths and weaknesses across different categories of actions. Additionally, GAIA is used as a benchmark platform to evaluate the performance of existing automatic evaluation methods in AQA tasks. The results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods perform poorly in AIGV, with average SRCCs of 0.454, 0.191, and 0.519, respectively, indicating a significant gap between current models and human action perception patterns.

GAIA: Rethinking Action Quality Assessment for AI-Generated Videos

A Survey of AI-Generated Video Evaluation

AIGC-VQA: A Holistic Perception Metric for AIGC Video Quality Assessment

Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap

A Perceptual Quality Assessment Exploration for AIGC Images

Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric

Advancing Video Quality Assessment for AIGC

AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM

A Comprehensive Survey of Action Quality Assessment: Method and Benchmark

A Survey of Video-based Action Quality Assessment

AI-generated Image Quality Assessment in Visual Communication

AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment

Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities

End-to-end Action Quality Assessment with Action Parsing Transformer

Auto-Encoding Score Distribution Regression for Action Quality Assessment.

Uncertainty-Driven Action Quality Assessment

Quality Assessment of AI-Generated Image Based on Cross-modal Correlation

Interpretable Long-term Action Quality Assessment