Guiding Language Model Reasoning with Planning Tokens

Xinyi Wang,Lucas Caccia,Oleksiy Ostapenko,Xingdi Yuan,William Yang Wang,Alessandro Sordoni
2024-08-07
Abstract:Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought (CoT) reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets and one multihop QA dataset with respect to standard fine-tuning baselines.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is: How to effectively guide large language models (LLMs) to generate more useful Chain-of-Thought (CoT) reasoning steps. Most existing methods primarily rely on data-driven approaches to enhance the reasoning capabilities of LLMs, while neglecting the structural aspects of model reasoning. This paper proposes a novel approach by generating planning tokens at the beginning of each reasoning step, which serve as high-level plans for the current step, thereby encouraging more structured CoT step generation. Specifically, the authors propose a hierarchical generation scheme, where at the beginning of each reasoning step, the language model generates a planning token. This token can be viewed as a high-level plan for the current step and is embedded into the model parameters. This method requires very few additional trainable parameters (0.001%) and can be implemented through full fine-tuning or more efficient parameter fine-tuning schemes. Experimental results show that this method significantly improves accuracy on three mathematical word problem datasets and a multi-hop question answering dataset, with notable improvements over standard fine-tuning baseline methods.