DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Murong Yue,Wenlin Yao,Haitao Mi,Dian Yu,Ziyu Yao,Dong Yu

2024-10-05

Abstract:Enhancing the capability of large language models (LLMs) in reasoning has gained significant attention in recent years. Previous studies have demonstrated the effectiveness of various prompting strategies in aiding LLMs in reasoning (called "reasoning actions"), such as step-by-step thinking, reflecting before answering, solving with programs, and their combinations. However, these approaches often applied static, predefined reasoning actions uniformly to all questions, without considering the specific characteristics of each question or the capability of the task-solving LLM. In this paper, we propose DOTS, an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM. Our approach involves three key steps: i) defining atomic reasoning action modules that can be composed into various reasoning action trajectories; ii) searching for the optimal action trajectory for each training question through iterative exploration and evaluation for the specific task-solving LLM; and iii) using the collected optimal trajectories to train an LLM to plan for the reasoning trajectories of unseen questions. In particular, we propose two learning paradigms, i.e., fine-tuning an external LLM as a planner to guide the task-solving LLM, or directly fine-tuning the task-solving LLM with an internalized capability for reasoning actions planning. Our experiments across eight reasoning tasks show that our method consistently outperforms static reasoning techniques and the vanilla instruction tuning approach. Further analysis reveals that our method enables LLMs to adjust their computation based on problem complexity, allocating deeper thinking and reasoning to harder problems.

Artificial Intelligence,Computation and Language,Machine Learning

What problem does this paper attempt to address?

### The Problem the Paper Aims to Solve The paper aims to address the issue of enhancing the reasoning capabilities of large language models (LLMs) in inference tasks. Specifically, the paper proposes a new method called DOTS (Dynamic Optimal Trajectories Search) for dynamically planning the optimal reasoning path. Existing methods typically employ static, predefined reasoning strategies, which are not always suitable for all problems or the capabilities of task-specific LLMs. Therefore, DOTS addresses this issue through the following three key steps: 1. **Defining Atomic Reasoning Modules**: Define multiple basic reasoning action modules that can be combined into different reasoning paths. 2. **Searching for the Optimal Reasoning Path**: For each training problem, iteratively explore and evaluate to find the optimal reasoning path that best suits the task-specific LLMs. 3. **Training LLMs for Reasoning Path Planning**: Use the collected optimal reasoning paths to train LLMs, enabling them to plan reasoning paths for unseen problems. Through this approach, DOTS allows LLMs to dynamically adjust their reasoning strategies based on the complexity of the problem and their own capabilities. Experimental results show that DOTS significantly outperforms static reasoning techniques and traditional instruction-tuning methods in various reasoning tasks. Additionally, DOTS demonstrates stability and generalization across different datasets, particularly excelling in handling out-of-distribution (OOD) tasks.

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

Can LLMs Reason in the Wild with Programs?

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths

Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

State Machine of Thoughts: Leveraging Past Reasoning Trajectories for Enhancing Problem Solving

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning