Memory-Augmented speech-to-text Translation with Multi-Scale Context Translation Strategy.

Yuxuan Yuan,Yue Zhou,Xiaodong Shi
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447450
2024-01-01
Abstract:End-to-end speech-to-text translation (ST) has demonstrated promising results on sentence-level translation. In real-world scenarios, audio is typically long and requires cross-sentence contextual connections for translation. Sentence-level ST models are facing challenges since they lack the ability to understand inter-sentential context. As context information has been proved to be effective for document-level machine translation, however, research on incorporating context information into ST remains under-explored. In this paper, we propose memory-augmented speech-to-text translation, which leverages a memory module to perform context-aware translation. To enhance the ability of the memory module to extract information from context, we develop Multi-Scale Context Translation Strategy (MSCTS) that translates segments with different size of context. Experiments on MuST-C benchmark show that our proposed method can significantly improve context-aware ST, outperforming the strong sentence-level baseline by +0.8 BLEU in average.
What problem does this paper attempt to address?