RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Dongwei Jiang,Guoxuan Wang,Yining Lu,Andrew Wang,Jingyu Zhang,Chuyu Liu,Benjamin Van Durme,Daniel Khashabi
2024-10-02
Abstract:The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps common in everyday communication found in their pre-training data: underlying rationales are frequently left implicit (unstated). To address this challenge, we introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training on a vast collection of rationale annotations extracted from unlabeled data. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. This web-scale pre-training for reasoning allows RATIONALYST to consistently generalize across diverse reasoning tasks, including mathematical, commonsense, scientific, and logical reasoning. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks. It also demonstrates superior performance compared to significantly larger verifiers like GPT-4 and similarly sized models fine-tuned on matching training sets.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of logical jumps and unexpressed implicit reasons in the reasoning steps generated by large - language models (LLMs). Specifically: 1. **Logical Jumps and Implicit Reasons**: When generating reasoning steps, existing LLMs may imitate the logical jumps in daily communication, resulting in some potential reasoning steps not being clearly stated. These implicit reasons are crucial for accurate reasoning but are often overlooked by existing models. 2. **Insufficient Reasoning Accuracy**: Due to the above problems, the reasoning accuracy of LLMs may be affected when handling various reasoning tasks (such as mathematics, common sense, science, and logical reasoning). To solve these problems, the paper introduces the **RATIONALYST** model, which supervises the reasoning process by pre - training a large number of implicit - reason annotations extracted from unlabeled data. The goal of RATIONALYST is to improve the performance of LLMs in different reasoning tasks and ensure that the reasoning steps are more complete and accurate. ### Main Contributions of RATIONALYST - **Proposing a New Model, RATIONALYST**: This model enhances the interpretability and performance of LLMs in the reasoning process by pre - training implicit reasons extracted from unlabeled text data. - **Demonstrating the Generalization Ability of RATIONALYST**: Experimental results show that RATIONALYST improves the accuracy by an average of 3.9% on multiple reasoning tasks and outperforms larger - scale validators (such as GPT - 4) and other models fine - tuned on similar training sets in some tasks. ### Method Overview The construction and use of RATIONALYST are divided into three stages: 1. **Large - Scale Implicit Reason Extraction**: Extract implicit reasons from large - scale unlabeled datasets (such as The Pile) and existing reasoning datasets, and retain useful implicit reasons through a filtering mechanism. 2. **Training RATIONALYST**: Use the extracted and filtered implicit reasons as target outputs to train the RATIONALYST model so that it can generate implicit reasons that are helpful for subsequent reasoning steps. 3. **Supervision during Reasoning**: During the reasoning process, RATIONALYST provides implicit reasons for other reasoning models (such as MAgent) to guide them to generate more accurate reasoning steps. This can be achieved through explicit or implicit supervision methods. Through this method, RATIONALYST can effectively fill in the logical steps omitted by existing LLMs in the reasoning process, thereby improving the accuracy and reliability of reasoning.