Generation of Artificial FO-contours of Emotional Speech with Generative Adversarial Networks

Shumpei Matsuoka,Yao Jiang,Akira Sasou
DOI: https://doi.org/10.1109/ssci44817.2019.9002917
2019-12-01
Abstract:Fundamental frequency (F0) contours play a very important role in reflecting the emotion, identity, intension, and attitude of a speaker in samples of speech. In this paper, we adopted a generative adversarial network (GAN) to generate artificial F0 contours of emotional speech. The GAN faces some limitations, however, in that it frequently generates undesired data because of unstable training, and it can repeatedly generate very similar or the same data, which is known as mode collapse. This study constructed a GAN-based generative model for F0 contours that can stably generate more-various F0 contours that fit the statistical characteristics of the training data. We tested the classification rate of four kinds of emotions in the F0 contours generated from five kinds of generative models. We also evaluated the averaged local density of the generated F0 contours to represent the variety of the generated F0 contours. Preliminary experiments confirmed the validity and effectiveness of the proposed generative model.
What problem does this paper attempt to address?