Abstract:Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time ("few-shot prompting"). Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is decomposed correctly. In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. With PAL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We demonstrate this synergy between a neural LLM and a symbolic interpreter across 13 mathematical, symbolic, and algorithmic reasoning tasks from BIG-Bench Hard and other benchmarks. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models. For example, PAL using Codex achieves state-of-the-art few-shot accuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540B which uses chain-of-thought by absolute 15% top-1. Our code and data are publicly available at <a class="link-external link-http" href="http://reasonwithpal.com/" rel="external noopener nofollow">this http URL</a> .

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the accuracy issue of large language models (LLMs) when dealing with complex arithmetic operations and logical reasoning. Although large language models perform well in arithmetic and symbolic reasoning tasks through "few - shot prompting", they often make logical and arithmetic mistakes in the solution part, even when the problem has been correctly decomposed. The paper proposes a new method - Program - Aided Language models (PAL), aiming to improve the accuracy of reasoning by using LLM to generate program code as an intermediate reasoning step and delegating the execution of the solution to the Python interpreter. This method not only improves the performance of the model on various mathematical, symbolic and algorithmic reasoning tasks, but also shows robustness when dealing with problems containing large numbers, surpassing existing methods such as Chain - of - Thought (COT). Specifically, PAL uses CODEX to achieve the state - of - the - art few - sample accuracy in the GSM 8K benchmark test, with a 15% higher absolute Top - 1 accuracy than PaLM - 540B using Chain - of - Thought. In addition, PAL shows stronger stability when dealing with more complex mathematical problems. For example, on the GSM - HARD dataset, the performance degradation of PAL is much smaller than that of other methods.

PAL: Program-aided Language Models

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Solving Math Word Problems by Combining Language Models With Symbolic Solvers

Reliable Reasoning Beyond Natural Language

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

Investigating Symbolic Capabilities of Large Language Models

Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control

Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions

LPML: LLM-Prompting Markup Language for Mathematical Reasoning

PaLM-E: An Embodied Multimodal Language Model

MathPrompter: Mathematical Reasoning using Large Language Models

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Proof Automation with Large Language Models

Learning to Program with Natural Language

LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations

AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving