Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Yu Zhao,Huifeng Yin,Bo Zeng,Hao Wang,Tianqi Shi,Chenyang Lyu,Longyue Wang,Weihua Luo,Kaifu Zhang
2024-11-22
Abstract:Currently OpenAI o1 has sparked a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: "Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?" Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.
Computation and Language
What problem does this paper attempt to address?
This paper aims to explore the application capabilities of large - scale reasoning models (LRM) in the open domain, especially how to effectively generalize to broader areas where there are no clear standard answers and rewards are difficult to quantify. Specifically, the paper attempts to solve this problem through the following points: 1. **Enhancing reasoning ability**: The paper introduces a model named Marco - o1. This model not only focuses on fields such as mathematics, physics, and programming (where standard answers are relatively clear and suitable for reinforcement learning), but also particularly emphasizes the improvement of the ability of open - ended solutions. 2. **Technical means**: - **Chain - of - Thought (CoT) fine - tuning**: The base model is fully parameter - fine - tuned using the CoT dataset to enhance the model's reasoning ability. - **Monte Carlo Tree Search (MCTS)**: Combine LLM with MCTS, and use the confidence of the model output to guide the search and expand the solution space. - **Reasoning action strategy**: Implement new reasoning action strategies and reflection mechanisms, including exploring different action granularities within the MCTS framework and prompting the model to self - reflect, thereby significantly improving the model's ability to solve complex problems. 3. **Experimental verification**: Through experiments on the MGSM (English and Chinese) datasets, the paper shows the performance improvement of Marco - o1 on reasoning tasks. In addition, it also verifies the advantages of the model in handling slang and colloquial expressions in translation tasks. 4. **Contribution points**: - **CoT data fine - tuning**: Developed Marco - o1 - CoT, which improves the model's reasoning ability through full - parameter fine - tuning of open - source CoT datasets and self - developed synthetic data. - **MCTS expands the solution space**: By integrating MCTS, use the confidence of the model output to guide the search and expand the solution space. - **Reasoning action strategy**: Implemented new reasoning action strategies and reflection mechanisms, which significantly enhance the model's ability to solve complex problems. - **Translation task application**: For the first time, a large - scale reasoning model is applied to machine translation tasks to explore reasoning abilities in multilingual and translation fields. Overall, through a series of technological innovations and experimental verifications, this paper shows how to solve complex problems in the open domain by enhancing the reasoning ability of large - scale reasoning models and expanding their application scope.