Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

Noé Tits,Kevin El Haddad,Thierry Dutoit
DOI: https://doi.org/10.48550/arXiv.2008.09483
2020-08-20
Abstract:Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listening test, comparing its performance to an HMM-based laughter synthesis one and assess that it reaches higher perceived naturalness. Our solution is a first step towards a TTS system that would be able to synthesize speech with a control on amusement level with laughter integration.
Audio and Speech Processing,Computation and Language,Machine Learning,Sound
What problem does this paper attempt to address?