Invariant Motion Representation Learning for 3D Talking Face Synthesis.

Jiyuan Liu,Wenping Wei,Zhendong Li,Guanfeng Li,Hao Liu
DOI: https://doi.org/10.1109/ICASSP48485.2024.10446379
2024-01-01
Abstract:In this paper, we propose the invariant motion representation learning method for deformable talking face synthesis. Conventional NeRF-based methods learn to match the audiomotion without considering motion consistency information, leading to blurry results, especially when face sequences were captured in wild conditions. To address this limitation, our model aims to explore the audio-motion invariance directly from the video clips and exploits the facial movements based on any given piece of speech. Specifically, we develop the motion invariance and audio-motion contrastive learning modules and then produce facial motion to probe facial landmarks into intra-person identity and intra-motion classes. Thus, our proposed cycle-loop paradigm achieves to reinforce lip synchronization and inter-frame consistency. Experimental results show the effectiveness of our method.
What problem does this paper attempt to address?