Abstract:Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. Additionally, CR sets new state-of-the-art on the MATH dataset, achieving a 4.2% increase from previous methods and a 43% relative improvement in the most challenging problems. By extending CR to incorporate a code environment without external aids like retrieval or web browsing, we further harness the computational and logical reasoning capabilities of LMs, achieving a remarkable 72.2% accuracy on the MATH dataset and outperforming the PAL/PoT method by 38.8%. Our work not only sets new state-of-the-art but also paves the way toward more sophisticated AI reasoning methods. The code is available at <a class="link-external link-https" href="https://github.com/iiis-ai/cumulative-reasoning" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper attempts to address the limited ability of large language models (LLMs) in handling complex problems. Despite significant advancements in language models in recent years, they still struggle to provide stable and accurate answers when faced with highly complex tasks, especially in areas such as logical reasoning and mathematical problem-solving. To this end, the paper introduces a new method—Cumulative Reasoning (CR), which leverages language models iteratively and cumulatively to simulate the human thought process in problem-solving, thereby significantly enhancing the ability to solve complex problems. Specifically, the main contributions of the paper include: 1. **Proposing the Cumulative Reasoning (CR) framework**: CR decomposes tasks into smaller, more manageable components and effectively combines them using previous propositions, thereby significantly enhancing problem-solving capabilities. 2. **Empirical evaluation**: Through multiple complex reasoning tasks, such as logical reasoning tasks, the 24-point game, and mathematical problem-solving, the superiority of CR is demonstrated. For example, on the FOLIO dataset, CR achieved an accuracy of 98.04%, which is a 9.3% improvement over existing methods; in the 24-point game, CR achieved an accuracy of 98%, which is a 24% improvement over the current state-of-the-art methods; on the MATH dataset, CR's accuracy improved by 4.2% over existing methods, with a relative improvement of 43% on the most difficult problems. 3. **Extending CR to incorporate code environments**: By integrating CR with a Python code environment, computational and logical reasoning capabilities are further enhanced, achieving an accuracy of 72.2% on the MATH dataset without relying on external tools, which is 38.8% higher than existing methods. Overall, the paper not only advances the application of language models in solving complex problems but also paves the way for more advanced AI reasoning methods in the future.

Cumulative Reasoning with Large Language Models

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

When Do Program-of-Thought Works for Reasoning?

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

Concise and Organized Perception Facilitates Reasoning in Large Language Models

On Memorization of Large Language Models in Logical Reasoning

Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

CLR-Bench: Evaluating Large Language Models in College-level Reasoning

Reasoning with Large Language Models, a Survey

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Improving Causal Reasoning in Large Language Models: A Survey

Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

Towards Reasoning in Large Language Models: A Survey

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models

Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization

The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance