Acquisition of Lip-Sync Expressions Using Transfer Learning for Text-to-Speech Emotional Expression Agents

Shintaro Kondo,Seiichi Harata,Takuto Sakuma,Shohei Kato
DOI: https://doi.org/10.1109/gcce53005.2021.9621872
2021-10-12
Abstract:In this study, we trained LSTM on a dataset of the emotional text reading and generated facial expressions for emotional text reading. However, the dataset has few speech patterns, and the lip-sync capability is not sufficient. In this paper, we transfer the knowledge of lip-sync expression from the Weekly Address dataset, which is used for learning a facial expression generation network to improve read-out facial expressions with emotions. As a result, the naturalness of the model was improved, and positive emotions were expressed more strongly than in the previous study.
What problem does this paper attempt to address?