A Model of Emotional Speech Generation Based on Conditional Generative Adversarial Networks

Chunjun Zheng,Wei Sun,Ning Jia
DOI: https://doi.org/10.1109/ihmsc.2019.00033
2019-08-01
Abstract:This paper presents a speech generation technology based on Conditional Generative Adversarial Networks (Conditional GAN). By introducing emotional conditions and learning emotional information in the speech database, a new voice with specified emotions can be generated independently. Generative Adversarial Networks consists of a Discriminator (D) and a Generator (G). Tensorflow is used as a learning framework, Conditional GAN model is used to train a large number of emotional voice, and voice generation network G and generation network D are used to form a dynamic “game process” to learn and observe the conditional distribution of emotional voice data better. The generated samples are close to the natural speech signals of the original learning content, which are rich in diversity and can approximate the real emotional speech. The proposed solution is evaluated on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus and student emotional database, and shows more accurate results than existing emotional speech generation algorithms.
What problem does this paper attempt to address?