End-to-End Spoken Language Translation

Michelle Guo,Albert Haque,Prateek Verma
DOI: https://doi.org/10.48550/arXiv.1904.10760
2019-04-23
Abstract:In this paper, we address the task of spoken language understanding. We present a method for translating spoken sentences from one language into spoken sentences in another language. Given spectrogram-spectrogram pairs, our model can be trained completely from scratch to translate unseen sentences. Our method consists of a pyramidal-bidirectional recurrent network combined with a convolutional network to output sentence-level spectrograms in the target language. Empirically, our model achieves competitive performance with state-of-the-art methods on multiple languages and can generalize to unseen speakers.
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?