Submission of USTC’s System for the IWSLT 2023 - Offline Speech Translation Track

Xinyuan Zhou,Jianwei Cui,Zhongyi Ye,Yichi Wang,Luzhen Xu,Hanyi Zhang,Weitai Zhang,Lirong Dai
DOI: https://doi.org/10.18653/v1/2023.iwslt-1.15
2023-01-01
Abstract:This paper describes the submissions of the research group USTC-NELSLIP to the 2023 IWSLT Offline Speech Translation competition, which involves translating spoken English into written Chinese. We utilize both cascaded models and end-to-end models for this task. To improve the performance of the cascaded models, we introduce Whisper to reduce errors in the intermediate source language text, achieving a significant improvement in ASR recognition performance. For end-to-end models, we propose Stacked Acoustic-and-Textual En- coding extension (SATE-ex), which feeds the output of the acoustic decoder into the textual decoder for information fusion and to prevent error propagation. Additionally, we improve the performance of the end-to-end system in translating speech by combining the SATE-ex model with the encoder-decoder model through ensembling.
What problem does this paper attempt to address?