Abstract:State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness. Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness. These methods work well on straightforward reasoning tasks but often falter on challenging tasks such as competitive programming and mathematics, due to frequent reasoning errors and irrelevant knowledge retrieval. To address this, we introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning. CR-Planner solves a problem by iteratively selecting and executing sub-goals. Initially, it identifies the most promising sub-goal from reasoning, query generation, and retrieval, guided by rewards given by a critic model named sub-goal critic. It then executes this sub-goal through sampling and selecting the optimal output based on evaluations from another critic model named execution critic. This iterative process, informed by retrieved information and critic models, enables CR-Planner to effectively navigate the solution space towards the final answer. We employ Monte Carlo Tree Search to collect the data for training the critic models, allowing for a systematic exploration of action sequences and their long-term impacts. We validate CR-Planner on challenging domain-knowledge-intensive and reasoning-heavy tasks, including competitive programming, theorem-driven math reasoning, and complex domain retrieval problems. Our experiments demonstrate that CR-Planner significantly outperforms baselines, highlighting its effectiveness in addressing challenging problems by improving both reasoning and retrieval.

SLR: A million-scale comprehensive crossword dataset for simultaneous learning and reasoning

Knowledge Crosswords: Geometric Knowledge Reasoning with Large Language Models

Language Models are Crossword Solvers

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models

Puzzle Solving using Reasoning of Large Language Models: A Survey

Piecing Together Clues: A Benchmark for Evaluating the Detective Skills of Large Language Models

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

Cruciform: Solving Crosswords with Natural Language Processing.

Cruciform: Solving Crosswords with Natural Language Processing

CLR-Bench: Evaluating Large Language Models in College-level Reasoning

Optimizing Language Model's Reasoning Abilities with Weak Supervision

BRAINTEASER: Lateral Thinking Puzzles for Large Language Models

Eliciting Better Multilingual Structured Reasoning from LLMs through Code

Cumulative Reasoning with Large Language Models

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

A Crossword Solving System Based on Monte Carlo Tree Search

Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks

Missed Connections: Lateral Thinking Puzzles for Large Language Models

REL: Working out is all you need

From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text