THQA: A Perceptual Quality Assessment Database for Talking Heads

Yingjie Zhou,Zicheng Zhang,Wei Sun,Xiaohong Liu,Xiongkuo Min,Zhihua Wang,Xiao-Ping Zhang,Guangtao Zhai
2024-04-13
Abstract:In the realm of media technology, digital humans have gained prominence due to rapid advancements in computer technology. However, the manual modeling and control required for the majority of digital humans pose significant obstacles to efficient development. The speech-driven methods offer a novel avenue for manipulating the mouth shape and expressions of digital humans. Despite the proliferation of driving methods, the quality of many generated talking head (TH) videos remains a concern, impacting user visual experiences. To tackle this issue, this paper introduces the Talking Head Quality Assessment (THQA) database, featuring 800 TH videos generated through 8 diverse speech-driven methods. Extensive experiments affirm the THQA database's richness in character and speech features. Subsequent subjective quality assessment experiments analyze correlations between scoring results and speech-driven methods, ages, and genders. In addition, experimental results show that mainstream image and video quality assessment methods have limitations for the THQA database, underscoring the imperative for further research to enhance TH video quality assessment. The THQA database is publicly accessible at
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
This paper focuses on the quality assessment of Talking Head (TH) videos generated by Artificial Intelligence (AI). Existing methods usually rely on comparison with the original videos, but this approach has limitations and is not applicable in the absence of reference videos. In this paper, the authors construct a large-scale database called THQA (Talking Head Quality Assessment), which contains 800 TH videos generated by 8 different speech-driven methods. These videos are manipulated from face images generated by StyleGAN using different speech-driven techniques. Through subjective quality assessment experiments, the authors analyze the relationship between the rating results and the speech-driven methods, age, and gender of the individuals, and find that mainstream image and video quality assessment methods perform poorly on the THQA database. This suggests the need for further research to improve the accuracy of TH video quality assessment. The THQA database is publicly available and aims to promote research and development in TH video quality assessment. The paper also discusses the limitations of current speech-driven methods, pointing out that although existing methods have simplified expression and motion design, the quality of AI-generated TH videos still needs improvement. In addition, the authors conduct a subjective experiment to collect the Mean Opinion Scores (MOS) for each generated TH video and validate the performance of common quality assessment methods as benchmarks. Overall, this paper aims to address the challenges of quality assessment for AI-generated TH videos by providing an evaluation framework through the establishment of the THQA database, and emphasizes the necessity of developing more effective assessment algorithms.