Jay Shim,Grant Kruttschnitt,Alyssa Ma,Daniel Kim,Benjamin Chek,Athul Anand,Kevin Zhu,Sean O'Brien
Abstract:Rapidly increasing model scales coupled with steering methods such as chain-of-thought prompting have led to drastic improvements in language model reasoning. At the same time, models struggle with compositional generalization and are far from human performance on many reasoning-based benchmarks. Leveraging the success of chain-of-thought prompting, and also taking inspiration from context-aware decoding (CAD), we explore input-based contrasting methods to further encourage the type of reasoning induced by chain-of-thought prompting. While work remains to stabilize these results across datasets and models, the improvements we find warrant further investigation into input-based steering methods for context-aware reasoning.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the reasoning ability in large - scale language models, especially by combining Chain - of - Thought (CoT) prompting and Context - Aware Decoding (CAD) techniques to achieve this goal. Although current language models have achieved significant improvements in reasoning performance through the Chain - of - Thought prompting method while their scale is constantly expanding, they still have difficulties in combinatorial generalization and are far from reaching human performance levels in many reasoning - based benchmark tests. Therefore, the paper explores input - based contrastive methods to further promote the type of reasoning triggered by Chain - of - Thought prompting, especially in cases where the context contradicts prior knowledge, ensuring that the introduction of Chain - of - Thought prompting does not weaken the output derived from the given context. The ultimate goal is to create faithful models that can accurately perform mathematical and reasoning tasks while avoiding misreading of key context cues.
Specifically, the paper attempts to solve the problem through the following points:
1. **Combining CoT and CAD**: Utilize the ability of CoT prompting to capture sequence dependencies, combined with the discrimination ability of CAD, especially when the context contradicts previous knowledge, to ensure that the introduction of CoT prompting does not damage the context - based output.
2. **Contrastive experiments**: By comparing expert prompts (i.e., 8 - shot CoT prompts) and different types of amateur prompts (i.e., no context, 8 - shot CoT prompts with omitted questions, 8 - shot prompts without CoT reasoning), study how these methods affect the reasoning performance of the model.
3. **Evaluating model performance**: Evaluate the performance of the model on multiple datasets, including GSM8K, AQuA, and CommonSenseQA, which focus on mathematical problems, multiple - choice questions, and common - sense reasoning problems respectively.
Through these methods, the paper aims to improve the accuracy and reliability of large - scale language models when handling tasks that require reasoning.