Improved Long-Form Spoken Language Translation with Large Language Models

Arya D. McCarthy,Hao Zhang,Shankar Kumar,Felix Stahlberg,Axel H. Ng
DOI: https://doi.org/10.48550/arXiv.2212.09895
2022-12-20
Abstract:A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we fine-tune a general-purpose, large language model to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We compare to several segmentation strategies and find that our approach improves BLEU score on three languages by an average of 2.7 BLEU overall compared to an automatic punctuation baseline. Further, we demonstrate the effectiveness of two constrained decoding strategies to improve well-formedness of the model output from above 99% to 100%.
Computation and Language
What problem does this paper attempt to address?
This paper aims to address a key challenge in long - form spoken - language translation, namely how to segment long - form automatic speech recognition (ASR) transcripts into units suitable for machine translation. Specifically, the paper focuses on how to handle large amounts of spoken - language content, which is usually long - form, while high - quality translation typically requires shorter units. To solve this mismatch problem, the authors propose a method to segment long - form ASR transcripts into independently translatable paragraphs by fine - tuning a general large - language model to maximize the overall translation quality. The main contributions of the paper include: - Proposing a method based on the sliding - window algorithm that can effectively handle long - form ASR transcripts while maintaining high precision. - Comparing with multiple segmentation strategies, the results show that the proposed method improves the BLEU score by an average of 2.7 on three languages compared to the automatic punctuation baseline method. - Further improving the integrity of the model output through two constrained decoding strategies, increasing the output accuracy from 99% to 100%. Overall, the goal of the paper is to improve the quality of machine translation by improving the segmentation techniques for long - form spoken - language content, especially when dealing with multi - sentence inputs.