Improved Long-Form Spoken Language Translation with Large Language Models

Arya D. McCarthy,Hao Zhang,Shankar Kumar,Felix Stahlberg,Axel H. Ng

DOI: https://doi.org/10.48550/arXiv.2212.09895

2022-12-20

Abstract:A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we fine-tune a general-purpose, large language model to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We compare to several segmentation strategies and find that our approach improves BLEU score on three languages by an average of 2.7 BLEU overall compared to an automatic punctuation baseline. Further, we demonstrate the effectiveness of two constrained decoding strategies to improve well-formedness of the model output from above 99% to 100%.

Computation and Language

What problem does this paper attempt to address?

This paper aims to address a key challenge in long - form spoken - language translation, namely how to segment long - form automatic speech recognition (ASR) transcripts into units suitable for machine translation. Specifically, the paper focuses on how to handle large amounts of spoken - language content, which is usually long - form, while high - quality translation typically requires shorter units. To solve this mismatch problem, the authors propose a method to segment long - form ASR transcripts into independently translatable paragraphs by fine - tuning a general large - language model to maximize the overall translation quality. The main contributions of the paper include: - Proposing a method based on the sliding - window algorithm that can effectively handle long - form ASR transcripts while maintaining high precision. - Comparing with multiple segmentation strategies, the results show that the proposed method improves the BLEU score by an average of 2.7 on three languages compared to the automatic punctuation baseline method. - Further improving the integrity of the model output through two constrained decoding strategies, increasing the output accuracy from 99% to 100%. Overall, the goal of the paper is to improve the quality of machine translation by improving the segmentation techniques for long - form spoken - language content, especially when dealing with multi - sentence inputs.

Improved Long-Form Spoken Language Translation with Large Language Models

Large-scale Language Model Rescoring on Long-form Data

Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages

Decoding with Large-Scale Neural Language Models Improves Translation.

Lightweight Audio Segmentation for Long-form Speech Translation

Iterative Translation Refinement with Large Language Models

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

Speech Translation with Large Language Models: An Industrial Practice

Re-Translation Strategies for Long Form, Simultaneous, Spoken Language Translation

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

On Efficient Coupling of ASR and SMT for Speech Translation

A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

Tuning Large language model for End-to-end Speech Translation

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes

Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

Prompting Large Language Models with Speech Recognition Abilities

Translate-and-Revise: Boosting Large Language Models for Constrained Translation