Abstract:Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines with reduced computational resources. This study provides insights into optimizing LLM planning capabilities.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the myopic problem existing in large language models (LLMs) when performing reasoning and planning. Although LLMs perform well in various tasks, such as mathematical problem - solving and programming, they often only focus on the best choice at the current step during the autoregressive decoding process and ignore the influence of subsequent steps, which may lead to irreversible errors and sub - optimal planning results. Specifically, the paper explores the following two key questions: 1. **Can LLMs actively avoid the occurrence of wrong steps without actually making these errors occur?** 2. **To what extent can the planning strategies based on LLMs achieve optimality?** To solve these problems, the paper re - examines the reasoning process of LLMs from the perspective of optimal control and proposes a new method - **Predictive - Decoding**. This method uses model predictive control (MPC) to enhance the accuracy of planning. By re - weighting the probability distribution of LLMs, predictive decoding aims to reduce early errors and promote non - myopic planning. ### Specific Problem Description #### 1. Definition of the Myopic Problem The paper defines the myopic gap in LLMs planning, that is: \[ p^*=\max_{a_0:\ldots:a_T\in P}P(a_0,a_1,\ldots,a_T)-P(a'_0,a'_1,\ldots,a'_T) \] where \( P \) is the support set of the distribution, and \( a'_0:\ldots:a_T \) is the sequence generated according to autoregression. If \( p^*>0 \), it means that LLMs are myopic in at least one intermediate step; if \( p^* = 0 \), it means having global awareness throughout the planning process. #### 2. Identifying Errors in Planning The paper also explores whether LLMs can identify errors in planning at an early stage. By analyzing the comparison between LLMs' evaluation of intermediate steps and human annotations, the study found that LLMs are difficult to accurately evaluate intermediate steps without future information, but the accuracy of evaluation is significantly improved after introducing foresight information of the next few steps. ### Solution #### Predictive - Decoding The core idea of predictive - decoding is to generate multiple foresight trajectories at each step and readjust the original generation distribution according to the evaluation results of these trajectories. Specifically, for each step \( a_t \), predictive - decoding will generate multiple foresight trajectories \( a_t,a_{t + 1},\ldots,a_{t+T_0} \), and then re - weight the generation distribution according to the evaluation results of these trajectories: \[ p_\tau(a_t)\propto P_{\text{LLM}}(a_t\mid a'_{<t},s'_{<t})\exp\left(\frac{E_{a_{>t},s_{>t}}P_{\text{LLM}}(a_t,a_{>t},s_{>t}\mid a'_{<t},s'_{<t})}{\tau}\right) \] In this way, predictive - decoding can consider future possibilities during the generation process, thereby reducing myopic behavior and improving the accuracy and global optimality of planning. ### Experimental Results The paper conducted experiments on multiple benchmark datasets such as mathematics, programming, and agent tasks. The results show that predictive - decoding significantly improves the accuracy of planning and reasoning without using additional supervision. For example, on the GSM8K dataset, the performance of predictive - decoding is 7.2% higher than that of the baseline method, and on the AlfWorld dataset, it is 25.3% higher. In addition, predictive - decoding also performs well in terms of computational efficiency and can achieve better performance with limited computational resources. In conclusion, by proposing the predictive - decoding method, this paper effectively solves the myopic problem of LLMs in reasoning and planning and improves the accuracy and global optimality of planning.

Non-myopic Generation of Language Models for Reasoning and Planning

Non-myopic Generation of Language Model for Reasoning and Planning

Language Model Non-myopic Generation for Reasoning and Planning

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

Guiding Language Model Reasoning with Planning Tokens

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Explicit Planning Helps Language Models in Logical Reasoning

On the Planning Abilities of Large Language Models : A Critical Investigation

Reasoning with Language Model is Planning with World Model

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models

Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Interactive and Expressive Code-Augmented Planning with Large Language Models

Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

Reasoning with Large Language Models, a Survey

Translating Natural Language to Planning Goals with Large-Language Models

AdaPlanner: Adaptive Planning from Feedback with Language Models