Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms

Wei Zhao,Li Xu,Ting He
DOI: https://doi.org/10.23919/ccc50068.2020.9188779
2020-01-01
Abstract:In this paper, we present two speaker gating mechanisms for multi-speaker Tacotron, a popular end-to-end text-to- speech (TTS) neural system, to improve the performance of generating multiple voices. With our presented mechanisms, the model can work better in both generalization and accuracy. As a starting point, we introduce the original multi-speaker Tacotron as a baseline model because of its excellent performance and straightforward structure. Employing gated linear units (GLUs), two different speaker gating mechanisms are then proposed for this model. Extensive experiments on VCTK dataset are conducted to demonstrate the validity of our methods. Conclusively, we find that it is promising to incorporate the speaker identity information by using the proposed speaker gating mechanisms.
What problem does this paper attempt to address?