Prompting Large Language Model for Machine Translation: A Case Study

Biao Zhang,Barry Haddow,Alexandra Birch
2023-01-18
Abstract:Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study on prompting strategies for translation, examining various factors for prompt template and demonstration example selection. We further explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning in prompting. Extensive experiments with GLM-130B (Zeng et al., 2022) as the testbed show that 1) the number and the quality of prompt examples matter, where using suboptimal examples degenerates translation; 2) several features of prompt examples, such as semantic similarity, show significant Spearman correlation with their prompting performance; yet, none of the correlations are strong enough; 3) using pseudo parallel prompt examples constructed from monolingual data via zero-shot prompting could improve translation; and 4) improved performance is achievable by transferring knowledge from prompt examples selected in other settings. We finally provide an analysis on the model outputs and discuss several problems that prompting still suffers from.
Computation and Language,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper primarily explores how to utilize large-scale language models (LLM) for machine translation (MT). Specifically, the paper attempts to address the following core issues: 1. **Prompt Strategies**: - What is the most suitable prompt template for machine translation? How do templates perform across different languages? - Do demonstration examples affect the quality of machine translation? How to select the best demonstration examples? 2. **Use of Monolingual Data**: - How to use monolingual data to improve the quality of machine translation? Is it effective to directly use monolingual data as demonstration examples? - Can constructing pseudo-parallel data through back-translation or forward-translation improve translation quality? 3. **Possibility of Transfer Learning**: - Do demonstration examples have transferability under different settings (such as different domains, different language pairs, or different document levels)? - Can cross-domain demonstration examples improve translation performance? Through the above research, the paper aims to fill the existing gap in the literature regarding how to effectively use prompt methods for machine translation and to explore the effects and potential issues of different prompt strategies.