Learning-From-Mistakes Prompting for Indigenous Language Translation

You-Cheng Liao,Chen-Jui Yu,Chi-Yi Lin,He-Feng Yun,Yen-Hsiang Wang,Hsiao-Min Li,Yao-Chung Fan

2024-07-18

Abstract:Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs and in-context learning techniques in such a setting for using LLMs as universal translators for extremely low-resourced languages. Our methodology hinges on utilizing LLMs as language compilers for selected language pairs, hypothesizing that they could internalize syntactic structures to facilitate accurate translation. We introduce three techniques: KNNPrompting with Retrieved Prompting Context, Chain-of-Thought Prompting and Learningfrom-Mistakes Prompting, with the last method addressing past errors. The evaluation results suggest that, even with limited corpora, LLMs can effectively translate extremely low-resource languages when paired with proper prompting.

Computation and Language

What problem does this paper attempt to address?

The paper attempts to address the issue of improving translation for extremely low-resource indigenous languages using large language models (LLMs). Specifically, the researchers aim to enhance the translation quality from Chinese to Taiwanese indigenous languages through the following three methods: 1. **KNN-Prompting with Retrieved Prompting Context (RPC)**: By retrieving examples similar to the context of the sentence to be translated and combining them with a word-level translation dictionary, this method aims to enhance the LLM's understanding of the target language's grammar and syntax. 2. **Chain-of-Thought (CoT) Prompting**: By providing chain-of-thought examples, this method guides the LLM to more effectively utilize RPC for translation. 3. **Learning-from-Mistakes (LFM) Prompting**: By introducing past translation errors as a feedback mechanism, this method further optimizes the translation results. These methods aim to leverage the intrinsic understanding and reasoning capabilities of LLMs to achieve effective translation for extremely low-resource languages in the context of limited parallel corpora. The researchers particularly focus on languages that the model has not been exposed to during the pre-training phase, thereby expanding the application scope of LLMs in low-resource language translation.

Learning-From-Mistakes Prompting for Indigenous Language Translation

Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach

Prompting Large Language Model for Machine Translation: A Case Study

Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in Low-Resource Languages

Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation

Teaching Large Language Models an Unseen Language on the Fly

Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language

Prompting PaLM for Translation: Assessing Strategies and Performance

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

NatLan: Native Language Prompting Facilitates Knowledge Elicitation Through Language Trigger Provision and Domain Trigger Retention

Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions

Cross-Lingual Transfer for Natural Language Inference via Multilingual Prompt Translator

Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT.

Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

Enhancing Function-Calling Capabilities in LLMs: Strategies for Prompt Formats, Data Integration, and Multilingual Translation

Prompting open-source and commercial language models for grammatical error correction of English learner text

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?