Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking

Yongqi Tong,Yifan Wang,Dawei Li,Sizhe Wang,Zi Lin,Simeng Han,Jingbo Shang
2023-11-15
Abstract:Chain-of-Thought(CoT) prompting and its variants explore equipping large language models (LLMs) with high-level reasoning abilities by emulating human-like linear cognition and logic. However, the human mind is complicated and mixed with both linear and nonlinear thinking. In this work, we propose \textbf{I}nferential \textbf{E}xclusion \textbf{P}rompting (IEP), a novel prompting that combines the principles of elimination and inference in order to guide LLMs to think non-linearly. IEP guides LLMs to plan and then utilize Natural Language Inference (NLI) to deduce each possible solution's entailment relation with context, commonsense, or facts, therefore yielding a broader perspective by thinking back for inferring. This forward planning and backward eliminating process allows IEP to better simulate the complex human thinking processes compared to other CoT-based methods, which only reflect linear cognitive processes. We conducted a series of empirical studies and have corroborated that IEP consistently outperforms CoT across various tasks. Additionally, we observe that integrating IEP and CoT further improves the LLMs' performance on certain tasks, highlighting the necessity of equipping LLMs with mixed logic processes. Moreover, to better evaluate comprehensive features inherent in human logic, we introduce \textbf{M}ental-\textbf{A}bility \textbf{R}easoning \textbf{B}enchmark (MARB). The benchmark comprises six novel subtasks with a total of 9,115 questions, among which 1,685 are developed with hand-crafted rationale references. We believe both \textsc{IEP} and \textsc{MARB} can serve as a promising direction for unveiling LLMs' logic and verbal reasoning abilities and drive further advancements. \textsc{MARB} will be available at ~\texttt{anonymity link} soon.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the limitations of large language models (LLMs) in performing complex reasoning tasks. Specifically, existing Chain-of-Thought (CoT) methods primarily simulate human linear cognition and logic, but the human thinking process is actually complex and a mix of linear and nonlinear thinking. Therefore, these methods may exhibit drawbacks such as error propagation, single-mindedness, and difficulty in internal evaluation when dealing with complex problems. To solve these issues, the authors propose a new prompting method—Inferential Exclusion Prompting (IEP). IEP combines the principles of exclusion and reasoning to guide LLMs in nonlinear thinking. By first planning possible solutions and then using Natural Language Inference (NLI) to verify each solution's relationship with context, common sense, or facts, IEP can better simulate complex cognitive processes. Additionally, to evaluate the comprehensive reasoning ability of LLMs, the authors introduce a new benchmark dataset—Mental-Ability Reasoning Benchmark (MARB). MARB includes six novel sub-tasks, totaling 9,115 questions, with 1,685 questions accompanied by handcrafted reasoning references. These tasks cover various reasoning modes, such as sentence reorganization, riddles, intelligence games, and critical reasoning. ### Main Contributions 1. **Proposing the IEP Method**: IEP simulates human exclusion reasoning by breaking down the reasoning process into multiple steps and innovatively modeling exclusion problems as NLI tasks. Extensive experiments validate the consistent superiority of IEP in various reasoning tasks. 2. **Releasing the MARB Benchmark**: MARB is a comprehensive and challenging benchmark dataset covering various reasoning games and challenges. The inclusion of handcrafted reasoning references provides deeper insights into the human decision-making process. ### Experimental Results - **Performance on Existing Benchmarks**: On existing benchmarks such as OpenbookQA, StrategyQA, CommonsenseQA, and LogiQA, IEP and the combination of IEP with CoT significantly outperform other prompting methods. Notably, on OpenbookQA, IEP outperforms CoT by 6.32% without requiring additional computational resources. - **Performance on MARB**: On MARB, CoT performs poorly on the sentence reorganization task, while IEP better verifies the logical consistency of each candidate answer globally and inversely. However, in handling riddle tasks, CoT may exhibit superior performance to IEP, as riddles often emphasize moments of insight in human cognition, which do not fully align with IEP's meticulous structured reasoning. ### Conclusion By introducing IEP and MARB, the authors not only improve the performance of LLMs in complex reasoning tasks but also provide new directions and tools for future research. These methods and benchmark datasets are expected to drive further development in the logical and language reasoning capabilities of LLMs.