Abstract:In recent years, digital humans have been widely applied in augmented/virtual reality (A/VR), where viewers are allowed to freely observe and interact with the volumetric content. However, the digital humans may be degraded with various distortions during the procedure of generation and transmission. Moreover, little effort has been put into the perceptual quality assessment of digital humans. Therefore, it is urgent to carry out objective quality assessment methods to tackle the challenge of digital human quality assessment (DHQA). In this paper, we develop a novel no-reference (NR) method based on Transformer to deal with DHQA in a multi-task manner. Specifically, the front 2D projections of the digital humans are rendered as inputs and the vision transformer (ViT) is employed for the feature extraction. Then we design a multi-task module to jointly classify the distortion types and predict the perceptual quality levels of digital humans. The experimental results show that the proposed method well correlates with the subjective ratings and outperforms the state-of-the-art quality assessment methods.

What problem does this paper attempt to address?

The paper primarily focuses on the quality assessment of Digital Human Heads (DHH). Specifically, with the development of Augmented Reality (AR) and Virtual Reality (VR) technologies, the application of digital humans in these fields is becoming increasingly widespread. However, various distortions may be introduced during the generation and transmission of digital humans, affecting their visual quality. Currently, there is limited work on the perceptual quality assessment of digital humans, especially digital human heads. Therefore, there is an urgent need to develop objective quality assessment methods to address the challenges of Digital Human Quality Assessment (DHQA). The paper proposes a Transformer-based No-Reference (NR) multi-task learning method to solve the aforementioned problem. This method is implemented through the following steps: 1. **Projection Module**: First, the frontal 2D projection of the digital human head is taken as input. 2. **Feature Extraction Module**: The Vision Transformer (ViT) is used to extract features from the projection image. 3. **Multi-task Module**: A multi-task module is designed, which includes two sub-tasks: one for classifying the specific distortion type of the digital human head, and the other for predicting its perceptual quality level. Experimental results show that the proposed method is highly correlated with subjective scores and outperforms current state-of-the-art quality assessment methods in terms of prediction accuracy. Additionally, comparisons with other No-Reference Image Quality Assessment (NR IQA) methods and Full-Reference Point Cloud Quality Assessment (FR PCQA) methods demonstrate the effectiveness and superiority of the proposed method. Finally, the paper conducts ablation experiments to verify the effectiveness of multi-task learning and compares the performance of different backbone networks.

A No-Reference Quality Assessment Method for Digital Human Head

Perceptual Quality Assessment for Digital Human Heads

A no-reference quality assessment metric for dynamic 3D digital human

A Reduced-Reference Quality Assessment Metric for Textured Mesh Digital Humans

Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation

DDH-QA: A Dynamic Digital Humans Quality Assessment Database

Geometry-Aware Video Quality Assessment for Dynamic Digital Human

No-reference image quality assessment based on global awareness

MSTRIQ: No Reference Image Quality Assessment Based on Swin Transformer with Multi-Stage Fusion

Non-reference Virtual View Quality Evaluation of MVD

A No-reference Image Quality Assessment Algorithm Based on Human Visual Perception Process Reconstruction.

Structured Computational Modeling of Human Visual System for No-reference Image Quality Assessment

Perception-Oriented U-Shaped Transformer Network for 360-Degree No-Reference Image Quality Assessment

Quality evaluation of point clouds: a novel no-reference approach using transformer-based architecture

No-Reference Image Quality Assessment: an Attention Driven Approach

Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality

Image Quality Evaluation Method and System for Head-Mounted Three-Dimensional Display

Blind Image Quality Assessment via Transformer Predicted Error Map and Perceptual Quality Token

No-Reference Multi-Level Video Quality Assessment Metric for 3D-Synthesized Videos

Study of 3D Virtual Reality Picture Quality