Abstract:Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. We address this question by leveraging gradient-based feature attribution methods which produce saliency scores that capture the influence of input tokens on model output. Specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand why Chain - of - Thought (CoT) prompts can improve the accuracy of large - scale language models (LLMs) in various question - answering tasks. Although CoT prompts have been proven to enhance model performance, the understanding of the underlying mechanisms remains insufficient. To ensure that this phenomenon is caused by the desired model behavior and to deploy these models responsibly, it is crucial to understand how CoT prompts work.
Specifically, the authors address this problem in the following ways:
1. **Utilize gradient - based feature attribution methods**: These methods can generate saliency scores, which are used to capture the influence of input tokens on the model output.
2. **Explore how CoT prompts change the model's importance assignment to specific input tokens**: By analyzing the saliency scores of input tokens under different prompting methods, study whether and how CoT prompts affect the model's attention to the input.
### Research Background and Motivation
With the rapid development of large - language models (such as models based on the Transformer architecture), both researchers and the public have shown great interest in them. However, the opacity of the internal mechanisms of these models makes it especially important to understand and interpret their behavior. Especially for new strategies like CoT prompts, understanding their working principles is crucial to ensure that the model's behavior is as expected, safe, and reliable.
### Main Research Questions
1. **Does the CoT prompt increase the saliency scores of semantically related tokens?**
- The authors assume that the CoT prompt will make the model pay more attention to important input tokens, even when the input length increases.
2. **Does the CoT prompt make the model behavior more robust to question restatements?**
- The authors assume that the CoT prompt can make the model have smaller changes in saliency scores when facing different formulations of the question, that is, the model focuses on relevant tokens more stably.
3. **Does the CoT prompt make the model gradients more stable in randomly generated outputs?**
- The authors assume that the CoT prompt can reduce the variation of saliency scores between different outputs, thereby improving the model's robustness to the randomness of text generation.
### Experimental Design
The authors used open - source models such as GPT - J (with 6 billion parameters) and conducted experiments on multiple question - answering datasets. By comparing the saliency scores under standard prompts and CoT prompts, the authors reached the following conclusions:
- The CoT prompt does not significantly increase the saliency scores of semantically related tokens, but improves the model's accuracy on some datasets.
- The CoT prompt makes the model more robust to question restatements, with smaller changes in saliency scores.
- The CoT prompt makes the model gradients more stable in different outputs, and the variance of saliency scores decreases.
### Conclusion
Although the CoT prompt does not significantly improve accuracy on smaller - scale models, it does change the way the model pays attention to input tokens, making it more stable and consistent. This indicates that the CoT prompt may improve performance by changing the internal processing mechanism of the model, not just by generating more reasonable explanations.
In summary, this study provides a new perspective for understanding the working mechanism of CoT prompts and lays the foundation for further exploration of the behavior of large - scale language models.