Self-Discover: Large Language Models Self-Compose Reasoning Structures

Pei Zhou,Jay Pujara,Xiang Ren,Xinyun Chen,Heng-Tze Cheng,Quoc V. Le,Ed H. Chi,Denny Zhou,Swaroop Mishra,Huaixiu Steven Zheng
2024-02-06
Abstract:We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper aims to address the challenges faced by large language models (LLMs) when handling complex reasoning tasks. Specifically, the paper proposes the SELF-DISCOVER framework, a method that enables LLMs to self-discover and assemble reasoning structures tailored to specific tasks. In this way, SELF-DISCOVER improves existing methods in the following aspects: 1. **Adaptability**: SELF-DISCOVER allows LLMs to select appropriate modules from a series of atomic reasoning modules and adjust them according to the specific task, thereby generating a unique reasoning structure suitable for that task. 2. **Efficiency**: Compared to other methods that require extensive reasoning computations (such as consistency self-checking), SELF-DISCOVER achieves significant performance improvements with only a few additional reasoning steps. 3. **Interpretability**: The reasoning structures generated by SELF-DISCOVER are more intuitive and easier to understand, which helps improve the interpretability of the model's output. The paper validates the effectiveness and efficiency of SELF-DISCOVER through multiple benchmarks (such as BigBench-Hard, Thinking for Doing, and MATH) and demonstrates its advantages over methods like direct answering and Chain of Thought (CoT). Additionally, SELF-DISCOVER shows cross-model family generality, effectively applying from PaLM 2 to GPT-4 and Llama 2.