Abstract:The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps common in everyday communication found in their pre-training data: underlying rationales are frequently left implicit (unstated). To address this challenge, we introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training on a vast collection of rationale annotations extracted from unlabeled data. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. This web-scale pre-training for reasoning allows RATIONALYST to consistently generalize across diverse reasoning tasks, including mathematical, commonsense, scientific, and logical reasoning. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks. It also demonstrates superior performance compared to significantly larger verifiers like GPT-4 and similarly sized models fine-tuned on matching training sets.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problems of logical jumps and unexpressed implicit reasons in the reasoning steps generated by large - language models (LLMs). Specifically: 1. **Logical Jumps and Implicit Reasons**: When generating reasoning steps, existing LLMs may imitate the logical jumps in daily communication, resulting in some potential reasoning steps not being clearly stated. These implicit reasons are crucial for accurate reasoning but are often overlooked by existing models. 2. **Insufficient Reasoning Accuracy**: Due to the above problems, the reasoning accuracy of LLMs may be affected when handling various reasoning tasks (such as mathematics, common sense, science, and logical reasoning). To solve these problems, the paper introduces the **RATIONALYST** model, which supervises the reasoning process by pre - training a large number of implicit - reason annotations extracted from unlabeled data. The goal of RATIONALYST is to improve the performance of LLMs in different reasoning tasks and ensure that the reasoning steps are more complete and accurate. ### Main Contributions of RATIONALYST - **Proposing a New Model, RATIONALYST**: This model enhances the interpretability and performance of LLMs in the reasoning process by pre - training implicit reasons extracted from unlabeled text data. - **Demonstrating the Generalization Ability of RATIONALYST**: Experimental results show that RATIONALYST improves the accuracy by an average of 3.9% on multiple reasoning tasks and outperforms larger - scale validators (such as GPT - 4) and other models fine - tuned on similar training sets in some tasks. ### Method Overview The construction and use of RATIONALYST are divided into three stages: 1. **Large - Scale Implicit Reason Extraction**: Extract implicit reasons from large - scale unlabeled datasets (such as The Pile) and existing reasoning datasets, and retain useful implicit reasons through a filtering mechanism. 2. **Training RATIONALYST**: Use the extracted and filtered implicit reasons as target outputs to train the RATIONALYST model so that it can generate implicit reasons that are helpful for subsequent reasoning steps. 3. **Supervision during Reasoning**: During the reasoning process, RATIONALYST provides implicit reasons for other reasoning models (such as MAgent) to guide them to generate more accurate reasoning steps. This can be achieved through explicit or implicit supervision methods. Through this method, RATIONALYST can effectively fill in the logical steps omitted by existing LLMs in the reasoning process, thereby improving the accuracy and reliability of reasoning.

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

IDOL: Indicator-oriented Logic Pre-training for Logical Reasoning

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

Rational Metareasoning for Large Language Models

Enhancing the Rationale-Input Alignment for Self-explaining Rationalization

Boosting Deductive Reasoning with Step Signals In RLHF

Improving Language Model Reasoning with Self-motivated Learning

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning

Tailoring Self-Rationalizers with Multi-Reward Distillation

Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples

Optimizing Language Model's Reasoning Abilities with Weak Supervision

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming

ReasoningRank: Teaching Student Models to Rank through Reasoning-Based Knowledge Distillation

Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up

LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models