Text-to-Speech Pipeline for Swiss German -- A comparison

Tobias Bollinger,Jan Deriu,Manfred Vogel
2023-05-31
Abstract:In this work, we studied the synthesis of Swiss German speech using different Text-to-Speech (TTS) models. We evaluated the TTS models on three corpora, and we found, that VITS models performed best, hence, using them for further testing. We also introduce a new method to evaluate TTS models by letting the discriminator of a trained vocoder GAN model predict whether a given waveform is human or synthesized. In summary, our best model delivers speech synthesis for different Swiss German dialects with previously unachieved quality.
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?