Abstract:Rapidly increasing model scales coupled with steering methods such as chain-of-thought prompting have led to drastic improvements in language model reasoning. At the same time, models struggle with compositional generalization and are far from human performance on many reasoning-based benchmarks. Leveraging the success of chain-of-thought prompting, and also taking inspiration from context-aware decoding (CAD), we explore input-based contrasting methods to further encourage the type of reasoning induced by chain-of-thought prompting. While work remains to stabilize these results across datasets and models, the improvements we find warrant further investigation into input-based steering methods for context-aware reasoning.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the reasoning ability in large - scale language models, especially by combining Chain - of - Thought (CoT) prompting and Context - Aware Decoding (CAD) techniques to achieve this goal. Although current language models have achieved significant improvements in reasoning performance through the Chain - of - Thought prompting method while their scale is constantly expanding, they still have difficulties in combinatorial generalization and are far from reaching human performance levels in many reasoning - based benchmark tests. Therefore, the paper explores input - based contrastive methods to further promote the type of reasoning triggered by Chain - of - Thought prompting, especially in cases where the context contradicts prior knowledge, ensuring that the introduction of Chain - of - Thought prompting does not weaken the output derived from the given context. The ultimate goal is to create faithful models that can accurately perform mathematical and reasoning tasks while avoiding misreading of key context cues. Specifically, the paper attempts to solve the problem through the following points: 1. **Combining CoT and CAD**: Utilize the ability of CoT prompting to capture sequence dependencies, combined with the discrimination ability of CAD, especially when the context contradicts previous knowledge, to ensure that the introduction of CoT prompting does not damage the context - based output. 2. **Contrastive experiments**: By comparing expert prompts (i.e., 8 - shot CoT prompts) and different types of amateur prompts (i.e., no context, 8 - shot CoT prompts with omitted questions, 8 - shot prompts without CoT reasoning), study how these methods affect the reasoning performance of the model. 3. **Evaluating model performance**: Evaluate the performance of the model on multiple datasets, including GSM8K, AQuA, and CommonSenseQA, which focus on mathematical problems, multiple - choice questions, and common - sense reasoning problems respectively. Through these methods, the paper aims to improve the accuracy and reliability of large - scale language models when handling tasks that require reasoning.

Chain-of-Thought Augmentation with Logit Contrast for Enhanced Reasoning in Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain-of-Thought in Large Language Models: Decoding, Projection, and Activation

Multimodal Chain-of-Thought Reasoning in Language Models

ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting

Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Chain-of-Thought Reasoning Without Prompting

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models

A comparison of chain-of-thought reasoning strategies across datasets and models

Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

Why Can Large Language Models Generate Correct Chain-of-Thoughts?

Uncovering Latent Chain of Thought Vectors in Language Models

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

An automatically discovered chain-of-thought prompt generalizes to novel models and datasets

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Break the Chain: Large Language Models Can be Shortcut Reasoners

Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models