Cumulative Reasoning with Large Language Models

Yifan Zhang,Jingqin Yang,Yang Yuan,Andrew Chi-Chih Yao
2024-04-02
Abstract:Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. Additionally, CR sets new state-of-the-art on the MATH dataset, achieving a 4.2% increase from previous methods and a 43% relative improvement in the most challenging problems. By extending CR to incorporate a code environment without external aids like retrieval or web browsing, we further harness the computational and logical reasoning capabilities of LMs, achieving a remarkable 72.2% accuracy on the MATH dataset and outperforming the PAL/PoT method by 38.8%. Our work not only sets new state-of-the-art but also paves the way toward more sophisticated AI reasoning methods. The code is available at <a class="link-external link-https" href="https://github.com/iiis-ai/cumulative-reasoning" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the limited ability of large language models (LLMs) in handling complex problems. Despite significant advancements in language models in recent years, they still struggle to provide stable and accurate answers when faced with highly complex tasks, especially in areas such as logical reasoning and mathematical problem-solving. To this end, the paper introduces a new method—Cumulative Reasoning (CR), which leverages language models iteratively and cumulatively to simulate the human thought process in problem-solving, thereby significantly enhancing the ability to solve complex problems. Specifically, the main contributions of the paper include: 1. **Proposing the Cumulative Reasoning (CR) framework**: CR decomposes tasks into smaller, more manageable components and effectively combines them using previous propositions, thereby significantly enhancing problem-solving capabilities. 2. **Empirical evaluation**: Through multiple complex reasoning tasks, such as logical reasoning tasks, the 24-point game, and mathematical problem-solving, the superiority of CR is demonstrated. For example, on the FOLIO dataset, CR achieved an accuracy of 98.04%, which is a 9.3% improvement over existing methods; in the 24-point game, CR achieved an accuracy of 98%, which is a 24% improvement over the current state-of-the-art methods; on the MATH dataset, CR's accuracy improved by 4.2% over existing methods, with a relative improvement of 43% on the most difficult problems. 3. **Extending CR to incorporate code environments**: By integrating CR with a Python code environment, computational and logical reasoning capabilities are further enhanced, achieving an accuracy of 72.2% on the MATH dataset without relying on external tools, which is 38.8% higher than existing methods. Overall, the paper not only advances the application of language models in solving complex problems but also paves the way for more advanced AI reasoning methods in the future.