Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric

Zhichao Zhang,Wei Sun,Xinyue Li,Yunhao Li,Qihang Ge,Jun Jia,Zicheng Zhang,Zhongpeng Ji,Fengyu Sun,Shangling Jui,Xiongkuo Min,Guangtao Zhai
2024-11-26
Abstract:AI-driven video generation techniques have made significant progress in recent years. However, AI-generated videos (AGVs) involving human activities often exhibit substantial visual and semantic distortions, hindering the practical application of video generation technologies in real-world scenarios. To address this challenge, we conduct a pioneering study on human activity AGV quality assessment, focusing on visual quality evaluation and the identification of semantic distortions. First, we construct the AI-Generated Human activity Video Quality Assessment (Human-AGVQA) dataset, consisting of 3,200 AGVs derived from 8 popular text-to-video (T2V) models using 400 text prompts that describe diverse human activities. We conduct a subjective study to evaluate the human appearance quality, action continuity quality, and overall video quality of AGVs, and identify semantic issues of human body parts. Based on Human-AGVQA, we benchmark the performance of T2V models and analyze their strengths and weaknesses in generating different categories of human activities. Second, we develop an objective evaluation metric, named AI-Generated Human activity Video Quality metric (GHVQ), to automatically analyze the quality of human activity AGVs. GHVQ systematically extracts human-focused quality features, AI-generated content-aware quality features, and temporal continuity features, making it a comprehensive and explainable quality metric for human activity AGVs. The extensive experimental results show that GHVQ outperforms existing quality metrics on the Human-AGVQA dataset by a large margin, demonstrating its efficacy in assessing the quality of human activity AGVs. The Human-AGVQA dataset and GHVQ metric will be released in public at <a class="link-external link-https" href="https://github.com/zczhang-sjtu/GHVQ.git" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problems of visual and semantic distortion in AI - generated videos (AGVs) involving human activities. Specifically, currently, human - activity videos generated by text - to - video (T2V) models often have significant visual and semantic distortions, which hinder the application of these technologies in practical scenarios. To meet this challenge, the author has conducted pioneering research, mainly focusing on the following aspects: 1. **Constructing a benchmark dataset**: - The author has constructed a dataset named Human - AGVQA, which contains 3,200 AI - generated human - activity videos generated by 8 popular T2V models. - These videos are generated according to 400 text prompts describing different human activities. - The dataset also includes subjective evaluation labels for evaluating the human - appearance quality, action continuity, and overall video quality in the videos. 2. **Developing objective evaluation metrics**: - The author has proposed a new objective evaluation metric GHVQ (AI - Generated Human activity Video Quality metric) for automatically evaluating the quality of AI - generated human - activity videos. - GHVQ provides a comprehensive and interpretable quality - evaluation method by systematically extracting human - related quality features, AI - generated - content - perception quality features, and temporal - continuity features. ### Main problem summary This paper aims to solve the following two core problems: 1. **Improving the quality - evaluation ability of AI - generated videos**: - Current image/video quality - evaluation (I/VQA) metrics perform poorly in evaluating AI - generated videos, especially for complex human - activity videos. - Traditional evaluation methods cannot accurately reflect the quality of individual videos or identify specific semantic - distortion problems. 2. **Providing a comprehensive benchmark platform**: - By constructing the Human - AGVQA dataset and the GHVQ evaluation metric, the author provides researchers with a comprehensive benchmark platform for evaluating and improving the quality of AI - generated videos. - This platform is helpful for monitoring the visual quality of large - scale AI - generated videos, measuring the progress of T2V models, and serving as an optimization or reward function to enhance the capabilities of T2V models. ### Solutions - **Human - AGVQA dataset**: It contains a large number of diverse AI - generated videos and their subjective evaluation labels, covering a wide range of human - activity categories. - **GHVQ evaluation metric**: Combining a spatial - quality analyzer, an action - quality analyzer, a text - feature extractor, and a quality - regressor, it can automatically evaluate video quality and identify semantic - distortion problems. Through these measures, the author hopes to promote the further development of AI - generated - video technology in practical applications, especially in the fields of entertainment, art, advertising, and education.