Learning-From-Mistakes Prompting for Indigenous Language Translation

You-Cheng Liao,Chen-Jui Yu,Chi-Yi Lin,He-Feng Yun,Yen-Hsiang Wang,Hsiao-Min Li,Yao-Chung Fan
2024-07-18
Abstract:Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs and in-context learning techniques in such a setting for using LLMs as universal translators for extremely low-resourced languages. Our methodology hinges on utilizing LLMs as language compilers for selected language pairs, hypothesizing that they could internalize syntactic structures to facilitate accurate translation. We introduce three techniques: KNNPrompting with Retrieved Prompting Context, Chain-of-Thought Prompting and Learningfrom-Mistakes Prompting, with the last method addressing past errors. The evaluation results suggest that, even with limited corpora, LLMs can effectively translate extremely low-resource languages when paired with proper prompting.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of improving translation for extremely low-resource indigenous languages using large language models (LLMs). Specifically, the researchers aim to enhance the translation quality from Chinese to Taiwanese indigenous languages through the following three methods: 1. **KNN-Prompting with Retrieved Prompting Context (RPC)**: By retrieving examples similar to the context of the sentence to be translated and combining them with a word-level translation dictionary, this method aims to enhance the LLM's understanding of the target language's grammar and syntax. 2. **Chain-of-Thought (CoT) Prompting**: By providing chain-of-thought examples, this method guides the LLM to more effectively utilize RPC for translation. 3. **Learning-from-Mistakes (LFM) Prompting**: By introducing past translation errors as a feedback mechanism, this method further optimizes the translation results. These methods aim to leverage the intrinsic understanding and reasoning capabilities of LLMs to achieve effective translation for extremely low-resource languages in the context of limited parallel corpora. The researchers particularly focus on languages that the model has not been exposed to during the pre-training phase, thereby expanding the application scope of LLMs in low-resource language translation.