Non-myopic Generation of Language Models for Reasoning and Planning

Chang Ma,Haiteng Zhao,Junlei Zhang,Junxian He,Lingpeng Kong
2024-10-29
Abstract:Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps. Despite their success in various domains like mathematical problem-solving and coding, LLMs face challenges in ensuring reliable and optimal planning due to their inherent myopic nature of autoregressive decoding. This paper revisits LLM reasoning from an optimal-control perspective, proposing a novel method, Predictive-Decoding, that leverages Model Predictive Control to enhance planning accuracy. By re-weighting LLM distributions based on foresight trajectories, Predictive-Decoding aims to mitigate early errors and promote non-myopic planning. Our experiments show significant improvements in a wide range of tasks for math, coding, and agents. Furthermore, Predictive-Decoding demonstrates computational efficiency, outperforming search baselines with reduced computational resources. This study provides insights into optimizing LLM planning capabilities.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the myopic problem existing in large language models (LLMs) when performing reasoning and planning. Although LLMs perform well in various tasks, such as mathematical problem - solving and programming, they often only focus on the best choice at the current step during the autoregressive decoding process and ignore the influence of subsequent steps, which may lead to irreversible errors and sub - optimal planning results. Specifically, the paper explores the following two key questions: 1. **Can LLMs actively avoid the occurrence of wrong steps without actually making these errors occur?** 2. **To what extent can the planning strategies based on LLMs achieve optimality?** To solve these problems, the paper re - examines the reasoning process of LLMs from the perspective of optimal control and proposes a new method - **Predictive - Decoding**. This method uses model predictive control (MPC) to enhance the accuracy of planning. By re - weighting the probability distribution of LLMs, predictive decoding aims to reduce early errors and promote non - myopic planning. ### Specific Problem Description #### 1. Definition of the Myopic Problem The paper defines the myopic gap in LLMs planning, that is: \[ p^*=\max_{a_0:\ldots:a_T\in P}P(a_0,a_1,\ldots,a_T)-P(a'_0,a'_1,\ldots,a'_T) \] where \( P \) is the support set of the distribution, and \( a'_0:\ldots:a_T \) is the sequence generated according to autoregression. If \( p^*>0 \), it means that LLMs are myopic in at least one intermediate step; if \( p^* = 0 \), it means having global awareness throughout the planning process. #### 2. Identifying Errors in Planning The paper also explores whether LLMs can identify errors in planning at an early stage. By analyzing the comparison between LLMs' evaluation of intermediate steps and human annotations, the study found that LLMs are difficult to accurately evaluate intermediate steps without future information, but the accuracy of evaluation is significantly improved after introducing foresight information of the next few steps. ### Solution #### Predictive - Decoding The core idea of predictive - decoding is to generate multiple foresight trajectories at each step and readjust the original generation distribution according to the evaluation results of these trajectories. Specifically, for each step \( a_t \), predictive - decoding will generate multiple foresight trajectories \( a_t,a_{t + 1},\ldots,a_{t+T_0} \), and then re - weight the generation distribution according to the evaluation results of these trajectories: \[ p_\tau(a_t)\propto P_{\text{LLM}}(a_t\mid a'_{<t},s'_{<t})\exp\left(\frac{E_{a_{>t},s_{>t}}P_{\text{LLM}}(a_t,a_{>t},s_{>t}\mid a'_{<t},s'_{<t})}{\tau}\right) \] In this way, predictive - decoding can consider future possibilities during the generation process, thereby reducing myopic behavior and improving the accuracy and global optimality of planning. ### Experimental Results The paper conducted experiments on multiple benchmark datasets such as mathematics, programming, and agent tasks. The results show that predictive - decoding significantly improves the accuracy of planning and reasoning without using additional supervision. For example, on the GSM8K dataset, the performance of predictive - decoding is 7.2% higher than that of the baseline method, and on the AlfWorld dataset, it is 25.3% higher. In addition, predictive - decoding also performs well in terms of computational efficiency and can achieve better performance with limited computational resources. In conclusion, by proposing the predictive - decoding method, this paper effectively solves the myopic problem of LLMs in reasoning and planning and improves the accuracy and global optimality of planning.