The impact of differences in facial features between real speakers and 3D face models on synthesized lip motions

Rabab Algadhy,Yoshihiko Gotoh,Steve Maddock
2024-07-24
Abstract:Lip motion accuracy is important for speech intelligibility, especially for users who are hard of hearing or second language learners. A high level of realism in lip movements is also required for the game and film production industries. 3D morphable models (3DMMs) have been widely used for facial analysis and animation. However, factors that could influence their use in facial animation, such as the differences in facial features between recorded real faces and animated synthetic faces, have not been given adequate attention. This paper investigates the mapping between real speakers and similar and non-similar 3DMMs and the impact on the resulting 3D lip motion. Mouth height and mouth width are used to determine face similarity. The results show that mapping 2D videos of real speakers with low mouth heights to 3D heads that correspond to real speakers with high mouth heights, or vice versa, generates less good 3D lip motion. It is thus important that such a mismatch is considered when using a 2D recording of a real actor's lip movements to control a 3D synthetic character.
Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: to study the impact of facial feature differences between real speakers and 3D face models on synthetic lip movements. Specifically, the author focuses on how the facial features (especially the height and width of the mouth) differences between the two will affect the final 3D lip movement effect when using the lip movements of real speakers in 2D videos to control the lip movements of 3D synthetic characters. ### Problem Background 1. **Importance of Lip Movement Accuracy** - For the hearing - impaired or non - native language learners, the accuracy of lip movements is crucial for language understanding. - In the game and movie production industries, realistic lip movements are also required. 2. **Deficiencies of Existing Methods** - Although 3D Morphable Models (3DMMs) have been widely used in facial analysis and animation, previous studies have not fully considered the impact of facial feature differences between real faces and synthetic faces on the lip movement synthesis effect. ### Research Objectives - **Evaluate the Impact of Facial Feature Differences**: Verify through experiments the impact of the similarity and difference of facial features (such as the height and width of the mouth) between real speakers and 3D synthetic characters on the lip movement synthesis effect. - **Provide Guiding Principles**: Provide guidance for selecting appropriate 3D synthetic characters, especially when animating historical figures, the deceased, or virtual characters. ### Research Methods 1. **Data Classification** - Use the data in the Audio - Visual Lombard Grid speech corpus to classify the speakers' facial features into three categories: low, medium, and high. 2. **Mapping Process** - Map the lip movements of real speakers in 2D videos to the corresponding 3DMMs and generate 3D lip movements. 3. **Quantitative and Qualitative Evaluation** - Evaluate the quality of 3D lip movements in different situations by calculating indicators such as Root Mean Square Error (RMSE). - Observe and record the visual differences in different facial feature matching situations. ### Main Findings - When a real speaker with a lower mouth height in a 2D video is mapped to a 3D synthetic character with a higher mouth height, or vice versa, the quality of the generated 3D lip movement is poor. - This mismatch will lead to a significant deviation in lip shape, especially when pronouncing certain phonemes (such as the bilabial /p/), the lips will not be able to open and close correctly due to thickness reasons. ### Conclusion In order to ensure high - quality 3D lip movement synthesis, the facial feature matching problem between real speakers and 3D synthetic characters must be carefully considered. Selecting 3D synthetic characters with similar facial features can significantly improve the effect of the final animation.