Abstract:This paper investigates an under-explored challenge in large language models (LLMs): chain-of-thought prompting with noisy rationales, which include irrelevant or inaccurate reasoning thoughts within examples used for in-context learning. We construct NoRa dataset that is tailored to evaluate the robustness of reasoning in the presence of noisy rationales. Our findings on NoRa dataset reveal a prevalent vulnerability to such noise among current LLMs, with existing robust methods like self-correction and self-consistency showing limited efficacy. Notably, compared to prompting with clean rationales, base LLM drops by 1.4%-19.8% in accuracy with irrelevant thoughts and more drastically by 2.2%-40.4% with inaccurate thoughts. Addressing this challenge necessitates external supervision that should be accessible in practice. Here, we propose the method of contrastive denoising with noisy chain-of-thought (CD-CoT). It enhances LLMs' denoising-reasoning capabilities by contrasting noisy rationales with only one clean rationale, which can be the minimal requirement for denoising-purpose prompting. This method follows a principle of exploration and exploitation: (1) rephrasing and selecting rationales in the input space to achieve explicit denoising and (2) exploring diverse reasoning paths and voting on answers in the output space. Empirically, CD-CoT demonstrates an average improvement of 17.8% in accuracy over the base model and shows significantly stronger denoising capabilities than baseline methods. The source code is publicly available at: <a class="link-external link-https" href="https://github.com/tmlr-group/NoisyRationales" rel="external noopener nofollow">this https URL</a>.

LLaMA-LoRA Neural Prompt Engineering: A Deep Tuning Framework for Automatically Generating Chinese Text Logical Reasoning Thinking Chains

ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models

LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning

Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

CoF-CoT: Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks

Cause-Aware Empathetic Response Generation via Chain-of-Thought Fine-Tuning

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Chain-of-Thought in Large Language Models: Decoding, Projection, and Activation

Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models

Active Prompting with Chain-of-Thought for Large Language Models

Pattern-Aware Chain-of-Thought Prompting in Large Language Models

Supervised Chain of Thought

Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

Chain-of-Thought Reasoning Without Prompting

Latent Skill Discovery for Chain-of-Thought Reasoning

R$^3$ Prompting: Review, Rephrase and Resolve for Chain-of-Thought Reasoning in Large Language Models under Noisy Context

Meta Reasoning for Large Language Models

Chain-of-Thought Prompting for Speech Translation