Abstract:Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.

What problem does this paper attempt to address?

This paper aims to explore the application capabilities of large - scale reasoning models (LRM) in the open domain, especially how to effectively generalize to broader areas where there are no clear standard answers and rewards are difficult to quantify. Specifically, the paper attempts to solve this problem through the following points: 1. **Enhancing reasoning ability**: The paper introduces a model named Marco - o1. This model not only focuses on fields such as mathematics, physics, and programming (where standard answers are relatively clear and suitable for reinforcement learning), but also particularly emphasizes the improvement of the ability of open - ended solutions. 2. **Technical means**: - **Chain - of - Thought (CoT) fine - tuning**: The base model is fully parameter - fine - tuned using the CoT dataset to enhance the model's reasoning ability. - **Monte Carlo Tree Search (MCTS)**: Combine LLM with MCTS, and use the confidence of the model output to guide the search and expand the solution space. - **Reasoning action strategy**: Implement new reasoning action strategies and reflection mechanisms, including exploring different action granularities within the MCTS framework and prompting the model to self - reflect, thereby significantly improving the model's ability to solve complex problems. 3. **Experimental verification**: Through experiments on the MGSM (English and Chinese) datasets, the paper shows the performance improvement of Marco - o1 on reasoning tasks. In addition, it also verifies the advantages of the model in handling slang and colloquial expressions in translation tasks. 4. **Contribution points**: - **CoT data fine - tuning**: Developed Marco - o1 - CoT, which improves the model's reasoning ability through full - parameter fine - tuning of open - source CoT datasets and self - developed synthetic data. - **MCTS expands the solution space**: By integrating MCTS, use the confidence of the model output to guide the search and expand the solution space. - **Reasoning action strategy**: Implemented new reasoning action strategies and reflection mechanisms, which significantly enhance the model's ability to solve complex problems. - **Translation task application**: For the first time, a large - scale reasoning model is applied to machine translation tasks to explore reasoning abilities in multilingual and translation fields. Overall, through a series of technological innovations and experimental verifications, this paper shows how to solve complex problems in the open domain by enhancing the reasoning ability of large - scale reasoning models and expanding their application scope.

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

OpenAI-o1 AB Testing: Does the o1 model really do good reasoning in math problem solving?

REL: Working out is all you need

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

OpenAI o1 System Card

Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths

Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search

When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1

MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization

BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving

Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models

Reasoning with Language Model is Planning with World Model

Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning

Unlocking the Boundaries of Thought: A Reasoning Granularity Framework to Quantify and Optimize Chain-of-Thought