Abstract:Simultaneous speech to speech translation aims to interpret concurrently with the speech in source language, which is of great importance to the real-time understanding of spoken lectures or conversations. Previous methods usually divide this problem into three stages: simultaneous automatic speech recognition (ASR), simultaneous neural machine translation (NMT), and simultaneous text to speech (TTS), which is not end-to-end and suffers from translation delay and error propagation. In this work, we propose SimulS2S, an end-to-end simultaneous speech to speech translation system that directly translates from source-language speech into target-language speech concurrently, which jointly optimizes speech recognition, text translation and speech synthesis in one sequence to sequence model. SimulS2S consists of a speech encoder and a speech decoder both with a speech segmenter and a wait- strategy for simultaneous translation. Since simultaneous speech to speech translation is challenging, we propose several key techniques to help the training of SimulS2S: 1) a curriculum learning mechanism to train the model gradually from full-sentence translation to simultaneous translation; 2) two auxiliary tasks: ASR and S2T (speech to text translation) that share the same encoder with SimulS2S model to help the training of the encoder; 3) knowledge distillation to transfer the knowledge from the cascaded NMT and TTS models to the SimulS2S model. Experiments on Fisher Spanish-English conversation translation datasets demonstrate that SimulS2S 1) achieves low translation delay and reasonable translation quality compared with full …

Anticipation-free Training for Simultaneous Translation

Anticipation-Free Training for Simultaneous Machine Translation

SimulS2S: End-to-End Simultaneous Speech to Speech Translation

Better Simultaneous Translation with Monotonic Knowledge Distillation.

Divergence-Guided Simultaneous Speech Translation

Hybrid-Regressive Paradigm for Accurate and Speed-Robust Neural Machine Translation

STACL: Simultaneous Translation with Integrated Anticipation and Controllable Latency

How to Do Simultaneous Translation Better with Consecutive Neural Machine Translation?

Learning to Use Future Information in Simultaneous Translation

Monotonic Simultaneous Translation with Chunk-wise Reordering and Refinement

A General Framework for Adaptation of Neural Machine Translation to Simultaneous Translation

Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation

Anticipating Future with Large Language Model for Simultaneous Machine Translation

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Learn to Use Future Information in Simultaneous Translation

SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation

Non-autoregressive Streaming Transformer for Simultaneous Translation

Learning Adaptive Segmentation Policy for Simultaneous Translation

Fixed and Adaptive Simultaneous Machine Translation Strategies Using Adapters

Improving Simultaneous Machine Translation with Monolingual Data

Context Consistency Between Training and Testing in Simultaneous Machine Translation.