Guiding Language Model Reasoning with Planning Tokens

Xinyi Wang,Lucas Caccia,Oleksiy Ostapenko,Xingdi Yuan,William Yang Wang,Alessandro Sordoni

2024-08-07

Abstract:Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought (CoT) reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets and one multihop QA dataset with respect to standard fine-tuning baselines.

Computation and Language,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

The problem this paper attempts to address is: How to effectively guide large language models (LLMs) to generate more useful Chain-of-Thought (CoT) reasoning steps. Most existing methods primarily rely on data-driven approaches to enhance the reasoning capabilities of LLMs, while neglecting the structural aspects of model reasoning. This paper proposes a novel approach by generating planning tokens at the beginning of each reasoning step, which serve as high-level plans for the current step, thereby encouraging more structured CoT step generation. Specifically, the authors propose a hierarchical generation scheme, where at the beginning of each reasoning step, the language model generates a planning token. This token can be viewed as a high-level plan for the current step and is embedded into the model parameters. This method requires very few additional trainable parameters (0.001%) and can be implemented through full fine-tuning or more efficient parameter fine-tuning schemes. Experimental results show that this method significantly improves accuracy on three mathematical word problem datasets and a multi-hop question answering dataset, with notable improvements over standard fine-tuning baseline methods.

Guiding Language Model Reasoning with Planning Tokens

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models

Language Model Non-myopic Generation for Reasoning and Planning

Non-myopic Generation of Language Model for Reasoning and Planning

Non-myopic Generation of Language Models for Reasoning and Planning

Improving Planning with Large Language Models: A Modular Agentic Architecture

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Reasoning with Language Model is Planning with World Model

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

Extending Token Computation for LLM Reasoning

Explicit Planning Helps Language Models in Logical Reasoning

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

On the Planning Abilities of Large Language Models : A Critical Investigation

Learning to Plan by Updating Natural Language

Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning

A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models

Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure

Multimodal Chain-of-Thought Reasoning in Language Models

Translating Natural Language to Planning Goals with Large-Language Models

Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models