Read it to me: An emotionally aware Speech Narration Application

Rishibha Bansal
DOI: https://doi.org/10.48550/arXiv.2209.02785
2022-09-07
Abstract:In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architecture is explored for various emotion-pair transfers. The generated audio is then classified using an LSTM-based emotion classifier for audio. We find that "sad" audio is generated well as compared to "happy" or "anger" as people have similar expressions of sadness.
Sound,Computation and Language,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?