Abstract:Recent developments, particularly OpenAI's O1 model, have demonstrated the remarkable potential of Large Language Models (LLMs) for complex reasoning tasks. Through analysis of O1's outputs and provided sample Chain-of-Thought (CoT) demonstrations, we observe that it approaches problem-solving in a distinctly human-like manner, systematically brainstorming ideas, testing hypotheses, verifying results, and planning comprehensive solutions. These sophisticated reasoning capabilities remain notably absent in other state-of-the-art language models. In this paper, we hypothesize that this performance gap stems from the limited availability of high-quality reasoning process data in current training sets. We demonstrate that by constructing a specialized dataset focused on explicit problem-solving workflows ("worked solutions"), we can elicit substantially improved planning capabilities from existing models. Additionally, we propose the Reasoning Enhancement Loop (REL), a method for generating synthetic worked solutions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that currently, large - language models (LLMs) have a significant gap in performance in complex reasoning tasks compared to human experts. In particular, they lack the ability to systematically explore the problem space, test hypotheses, and gradually optimize solutions. Although methods such as traditional Chain - of - Thought (CoT) prompting and Self - Taught Reasoning can partially improve the reasoning ability of LLMs, these methods usually can only generate linear solution paths and cannot solve problems in an exploratory way like human experts. The core hypothesis of the paper is that this performance gap mainly stems from the lack of high - quality reasoning process data in current training datasets. To verify this hypothesis, the author proposes a new method - constructing a specialized dataset that contains detailed "worked solutions", and introduces a method named Reasoning Enhancement Loop (REL) to automatically generate more high - quality worked - solution data. Through this method, the author aims to improve the planning ability and problem - solving ability of existing language models. Specifically, the main contributions of the paper include: 1. **Innovative dataset creation method**: Combine human expert knowledge and AI - assisted technology to efficiently generate high - quality worked - solution data. 2. **ReasonSet dataset**: A detailed dataset containing worked solutions, covering all stages from brainstorming to hypothesis testing to solution optimization. 3. **REL method**: A critic - generator pipeline that can automatically generate more high - quality worked - solution data. 4. **Empirical evidence**: It shows that models trained with worked - solution data are significantly superior to traditional methods in terms of planning and problem - solving abilities, for example, achieving a 18.9% performance improvement on AIME 2024. 5. **Proof - of - concept model**: Release O1 - Llama 3.2 3B, demonstrating how to stimulate such reasoning abilities in LLMs. Through these contributions, the paper not only addresses the shortcomings of current LLMs in complex reasoning tasks but also provides a scalable method to enhance the reasoning ability of models.

REL: Working out is all you need

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

When Do Program-of-Thought Works for Reasoning?

Reasoning with Language Model is Planning with World Model

Reliable Reasoning Beyond Natural Language

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

LeanReasoner: Boosting Complex Logical Reasoning with Lean

Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought

Break the Chain: Large Language Models Can be Shortcut Reasoners

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

Concise and Organized Perception Facilitates Reasoning in Large Language Models

Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models

Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

Reasoning with Large Language Models, a Survey