THQA: A Perceptual Quality Assessment Database for Talking Heads

Yingjie Zhou,Zicheng Zhang,Wei Sun,Xiaohong Liu,Xiongkuo Min,Zhihua Wang,Xiao-Ping Zhang,Guangtao Zhai

2024-04-13

Abstract:In the realm of media technology, digital humans have gained prominence due to rapid advancements in computer technology. However, the manual modeling and control required for the majority of digital humans pose significant obstacles to efficient development. The speech-driven methods offer a novel avenue for manipulating the mouth shape and expressions of digital humans. Despite the proliferation of driving methods, the quality of many generated talking head (TH) videos remains a concern, impacting user visual experiences. To tackle this issue, this paper introduces the Talking Head Quality Assessment (THQA) database, featuring 800 TH videos generated through 8 diverse speech-driven methods. Extensive experiments affirm the THQA database's richness in character and speech features. Subsequent subjective quality assessment experiments analyze correlations between scoring results and speech-driven methods, ages, and genders. In addition, experimental results show that mainstream image and video quality assessment methods have limitations for the THQA database, underscoring the imperative for further research to enhance TH video quality assessment. The THQA database is publicly accessible at

Computer Vision and Pattern Recognition,Image and Video Processing

What problem does this paper attempt to address?

This paper focuses on the quality assessment of Talking Head (TH) videos generated by Artificial Intelligence (AI). Existing methods usually rely on comparison with the original videos, but this approach has limitations and is not applicable in the absence of reference videos. In this paper, the authors construct a large-scale database called THQA (Talking Head Quality Assessment), which contains 800 TH videos generated by 8 different speech-driven methods. These videos are manipulated from face images generated by StyleGAN using different speech-driven techniques. Through subjective quality assessment experiments, the authors analyze the relationship between the rating results and the speech-driven methods, age, and gender of the individuals, and find that mainstream image and video quality assessment methods perform poorly on the THQA database. This suggests the need for further research to improve the accuracy of TH video quality assessment. The THQA database is publicly available and aims to promote research and development in TH video quality assessment. The paper also discusses the limitations of current speech-driven methods, pointing out that although existing methods have simplified expression and motion design, the quality of AI-generated TH videos still needs improvement. In addition, the authors conduct a subjective experiment to collect the Mean Opinion Scores (MOS) for each generated TH video and validate the performance of common quality assessment methods as benchmarks. Overall, this paper aims to address the challenges of quality assessment for AI-generated TH videos by providing an evaluation framework through the establishment of the THQA database, and emphasizes the necessity of developing more effective assessment algorithms.

THQA: A Perceptual Quality Assessment Database for Talking Heads

A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos

Perceptual Quality Assessment for Digital Human Heads

DDH-QA: A Dynamic Digital Humans Quality Assessment Database

Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation

Video Quality Assessment: A Comprehensive Survey

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

Human-Computer Interaction System: A Survey of Talking-Head Generation

Audio-Visual Quality Assessment for User Generated Content: Database and Method

A no-reference quality assessment metric for dynamic 3D digital human

A No-Reference Quality Assessment Method for Digital Human Head

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing

Subjective and Objective Audio-Visual Quality Assessment for User Generated Content

Telepresence Video Quality Assessment

Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric

THU Face Database for Real-Time Automatic Video Scoring Model

Perceptual Video Quality Assessment: A Survey

Visual Quality Assessment for Web Videos

Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

QoE-Oriented Multimedia Assessment: A Facial Expression Recognition Approach