Patience Is The Key to Large Language Model Reasoning

Yijiong Yu
2024-11-20
Abstract:Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. However, existing models either tend to sacrifice detailed reasoning for brevity due to user preferences, or require extensive and expensive training data to learn complicated reasoning ability, limiting their potential in solving complex tasks. To bridge this gap, following the concept of scaling test-time, we propose a simple method by encouraging models to adopt a more patient reasoning style without the need of introducing new knowledge or skills. To employ a preference optimization approach, we generate detailed reasoning processes as positive examples and simple answers as negative examples, thereby training the model to favor thoroughness in its responses. Our results demonstrate a performance increase of up to 6.7% on GSM8k with training just on a lightweight dataset.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when current large - language models (LLMs) solve complex problems, they either sacrifice detailed reasoning processes for the sake of brevity or require a large amount of high - quality training data to learn complex reasoning abilities, which limits their potential in handling complex tasks. The author proposes a simple method to improve the model's ability to solve complex problems by encouraging the model to adopt a more patient reasoning style without introducing new knowledge or skills. Specifically, the paper solves this problem through the following methods: 1. **Generate detailed reasoning processes**: Use existing high - quality LLMs to generate detailed reasoning steps as positive examples and simple answers as negative examples. 2. **Preference optimization**: Through preference optimization techniques (such as DPO), train the model to be more inclined to provide detailed reasoning processes. 3. **Evaluation and verification**: Evaluate on mathematical problem - solving benchmarks (such as GSM8k and MATH) to verify the effectiveness of the method. Through these methods, the paper aims to improve the accuracy and reasoning ability of LLMs when solving complex problems, although this may increase some reasoning time.