Abstract:With the advent of Large Language Models (LLMs), generating rule-based data for real-world applications has become more accessible. Due to the inherent ambiguity of natural language and the complexity of rule sets, especially in long contexts, LLMs often struggle to follow all specified rules, frequently omitting at least one. To enhance the reasoning and understanding of LLMs on long and complex contexts, we propose a novel prompting strategy Multi-Lingual Prompt, namely MLPrompt, which automatically translates the error-prone rule that an LLM struggles to follow into another language, thus drawing greater attention to it. Experimental results on public datasets across various tasks have shown MLPrompt can outperform state-of-the-art prompting methods such as Chain of Thought, Tree of Thought, and Self-Consistency. Additionally, we introduce a framework integrating MLPrompt with an auto-checking mechanism for structured data generation, with a specific case study in text-to-MIP instances. Further, we extend the proposed framework for text-to-SQL to demonstrate its generation ability towards structured data synthesis.

What problem does this paper attempt to address?

### The Problem the Paper Aims to Solve This paper aims to address the issue of large language models (LLMs) struggling to follow all specified rules when dealing with long and complex rules. Specifically, the authors propose a new prompting strategy—Multilingual Prompting (MLPrompt)—which improves LLMs' attention to and understanding of these rules by translating the error-prone rules into another language. ### Background and Motivation 1. **Challenges of LLMs in Generating Complex Data**: - Large language models (LLMs) often fail to fully adhere to all rules when generating data that follows specific rules, due to the inherent ambiguity of natural language and the complexity of the rule sets, especially in long contexts. - This results in low-quality generated data, particularly when structured data (such as JSON, SQL, etc.) needs to be generated. 2. **Limitations of Existing Methods**: - Existing multi-step reasoning methods (such as Chain of Thought, Tree of Thought, Self-Consistency, etc.) can improve the quality of generated data, but these methods usually require multiple reasoning steps, increasing computational time and complexity. - These methods struggle to decompose tasks into independent parts when dealing with structured data, due to interdependencies between the parts. ### Proposed Method 1. **MLPrompt**: - **Multilingual Prompting**: By translating error-prone rules into a non-dominant language of the LLM, the LLM's attention to these rules is enhanced. - **Automatic Checking Mechanism**: Combined with an automatic checking mechanism, iteratively updating prompts to ensure the generated data meets the input constraints. 2. **Experimental Validation**: - The authors conducted experiments on multiple public datasets to validate the effectiveness of MLPrompt on various tasks, including generating MIP instances and Text-to-SQL tasks. - Experimental results show that MLPrompt outperforms existing multi-step reasoning methods in generating structured data. ### Application Scenarios 1. **MIP Instance Generation**: - Mixed Integer Programming (MIP) is an important part of operations research, widely used in logistics, scheduling, and supply chain management. - The authors propose a general MIP instance generation pipeline, using MLPrompt to generate MIP instances that meet specific constraints. 2. **Text-to-SQL**: - Converting natural language queries into SQL statements is a more challenging task because detecting rule violations in SQL is more difficult. - The authors validated the effectiveness and generalization ability of MLPrompt in the Text-to-SQL task through experiments. ### Main Contributions 1. **Proposing MLPrompt**: A simple and effective multilingual prompting strategy that enhances LLMs' reasoning ability through cross-language understanding. 2. **First Attempt**: Utilizing LLMs to generate MIP instances, serving as a bridge between research datasets and industrial needs, and extendable to a general structured data generation pipeline. 3. **Experimental Validation**: Extensive experiments on the ComplexOR dataset show that MLPrompt outperforms existing prompting strategies in the MIP instance generation task. Additionally, experiments on the Text-to-SQL task further demonstrate the framework's broad application potential in other structured data generation tasks. Through these contributions, the paper not only addresses the challenges of LLMs in generating complex structured data but also provides new directions and tools for future research.

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Large Language Models are Contrastive Reasoners

Automatic Prompt Selection for Large Language Models

Efficient Prompting Methods for Large Language Models: A Survey

Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

Context-faithful Prompting for Large Language Models

Towards Goal-oriented Prompt Engineering for Large Language Models: A Survey

Metacognitive Prompting Improves Understanding in Large Language Models

Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

Large Language Models can Learn Rules

Are Large Language Models Good Prompt Optimizers?

Prompting Is Programming: A Query Language for Large Language Models

M-Ped: Multi-Prompt Ensemble Decoding for Large Language Models

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text

Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models

Towards Generalist Prompting for Large Language Models by Mental Models

Progressive-Hint Prompting Improves Reasoning in Large Language Models

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models

Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models