Patience Is The Key to Large Language Model Reasoning

Yijiong Yu

2024-11-20

Abstract:Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. However, existing models either tend to sacrifice detailed reasoning for brevity due to user preferences, or require extensive and expensive training data to learn complicated reasoning ability, limiting their potential in solving complex tasks. To bridge this gap, following the concept of scaling test-time, we propose a simple method by encouraging models to adopt a more patient reasoning style without the need of introducing new knowledge or skills. To employ a preference optimization approach, we generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses. Our results demonstrate a performance increase of up to 6.7% on GSM8k with training just on a lightweight dataset.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: when current large - language models (LLMs) solve complex problems, they either sacrifice detailed reasoning processes for the sake of brevity or require a large amount of high - quality training data to learn complex reasoning abilities, which limits their potential in handling complex tasks. The author proposes a simple method to improve the model's ability to solve complex problems by encouraging the model to adopt a more patient reasoning style without introducing new knowledge or skills. Specifically, the paper solves this problem through the following methods: 1. **Generate detailed reasoning processes**: Use existing high - quality LLMs to generate detailed reasoning steps as positive examples and simple answers as negative examples. 2. **Preference optimization**: Through preference optimization techniques (such as DPO), train the model to be more inclined to provide detailed reasoning processes. 3. **Evaluation and verification**: Evaluate on mathematical problem - solving benchmarks (such as GSM8k and MATH) to verify the effectiveness of the method. Through these methods, the paper aims to improve the accuracy and reasoning ability of LLMs when solving complex problems, although this may increase some reasoning time.

Patience Is The Key to Large Language Model Reasoning

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

Enhancing the Reasoning Capabilities of Small Language Models via Solution Guidance Fine-Tuning

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

The Impact of Reasoning Step Length on Large Language Models

Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation

Large Language Models Are Also Good Prototypical Commonsense Reasoners

Large Language Models as Analogical Reasoners

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

Break the Chain: Large Language Models Can be Shortcut Reasoners

Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models

Concise and Organized Perception Facilitates Reasoning in Large Language Models

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Rational Metareasoning for Large Language Models

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Pattern-Aware Chain-of-Thought Prompting in Large Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Measuring Faithfulness in Chain-of-Thought Reasoning

Think Beyond Size: Adaptive Prompting for More Effective Reasoning