Abstract:Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making adaptation computationally prohibitive. We believe (and demonstrate) that problem decomposition and solution generation are distinct capabilites, better addressed in separate modules, than by one monolithic LLM. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. These subproblems are answered by a solver. We use a relatively small (13B parameters) LM as the decomposition generator, which we train using policy gradient optimization to interact with a solver LM (regarded as black-box) and guide it through subproblems, thereby rendering our method solver-agnostic. Evaluation on multiple different reasoning datasets reveal that with our method, a 175 billion parameter LM (text-davinci-003) can produce competitive or even better performance, compared to its orders-of-magnitude larger successor, GPT-4. Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e.g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique. Exhaustive ablation studies evince the superiority of our modular finetuning technique over exorbitantly large decomposer LLMs, based on prompting alone.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the challenges faced by large language models (LLMs) when dealing with complex multi-step reasoning problems. Specifically: 1. **Separation of Decomposition and Solving**: - Existing methods typically rely on a large language model to simultaneously perform both problem decomposition and solving tasks, requiring the model to possess both decomposition and solving capabilities, leading to large model sizes and difficulty in fine-tuning. - The paper proposes separating the problem decomposition and solving functions, with a smaller model dedicated to problem decomposition and another model responsible for solving. 2. **Modular Fine-Tuning**: - A method named DaSLaM is proposed, which includes a decomposition generator (using a smaller 1.3 billion parameter model) optimized via policy gradient to guide the solver (treated as a black box). - This method demonstrates that modular fine-tuning techniques are more effective compared to directly using ultra-large models. 3. **Performance Improvement**: - Experiments on multiple datasets show that DaSLaM can significantly enhance the performance of existing models (such as GPT-3.5), even rivaling larger models (such as GPT-4). - In some tasks, the performance of the DaSLaM-enhanced GPT-3.5 model surpasses that of GPT-4. 4. **Flexibility and Robustness**: - DaSLaM not only improves model performance but also demonstrates robustness when facing challenging datasets, without relying on the solver's capability scale. Through these improvements, the paper showcases the effectiveness of enhancing complex reasoning task handling capabilities via modular and specialized approaches.

Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

$\texttt{LM}^\texttt{2}$: A Simple Society of Language Models Solves Complex Reasoning

Distilling LLMs' Decomposition Abilities into Compact Language Models

Enhancing the Reasoning Capabilities of Small Language Models via Solution Guidance Fine-Tuning

Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks

Reasoning with Large Language Models, a Survey

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Mixed Distillation Helps Smaller Language Model Better Reasoning

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text

Reliable Reasoning Beyond Natural Language

Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming

Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

Inductive Linguistic Reasoning with Large Language Models

Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Evaluating the Deductive Competence of Large Language Models

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning

Mixed Distillation Helps Smaller Language Models Reason Better