Uncovering Latent Chain of Thought Vectors in Language Models

Jason Zhang,Scott Viteri
2024-09-21
Abstract:As language models grow more influential and trusted in our society, our ability to reliably steer them toward favorable behaviors becomes increasingly paramount. For this, we investigate the technique of steering vectors: biasing the forward pass of language models using a "steering vector" derived from a specific task. We apply them to steer language models toward performing Chain of Thought (CoT) Reasoning without the need to prompt through natural language. We demonstrate this approach on Llama3 8b and Mistral 7b v0.2, and obtain competitive results compared to CoT-prompted performances on a series of reasoning benchmarks (GSM8k, MMLU, AGI Eval, ARC AI2) and qualitative examples. We find this approach yields consistent steering towards CoT responses and takes less compute than traditional methods of fine-tuning models towards CoT.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to reliably guide large - language models (LLMs) to perform Chain of Thought (CoT) reasoning. As the influence and credibility of language models in society continue to increase, it is crucial to ensure that these models can accurately and reasonably handle complex reasoning tasks. Specifically, the authors studied a technique called "steering vector". By introducing a task - derived steering vector in the forward - passing process of the language model, the model is guided to perform Chain of Thought reasoning without the need for natural - language prompts. This method not only reduces the consumption of computational resources but also improves the performance of the model on reasoning tasks. ### Main contributions 1. **Proposing the steering vector technique**: By extracting and applying the steering vector, the language model can perform Chain of Thought reasoning without relying on natural - language prompts. 2. **Verifying effectiveness**: The effectiveness of the steering vector was verified on multiple reasoning benchmark tests (such as GSM8k, MMLU, AGI Eval, etc.), and its competitive performance compared with the traditional CoT prompting method was demonstrated. 3. **Reducing computational cost**: Compared with the traditional fine - tuning method, the steering vector method requires fewer computational resources and is more efficient. ### Experimental results The experimental results show that on two language models, Llama3 8b Instruct and Mistral 7b v0.2 Instruct, the steering vector method achieved performance comparable to or even better than that of the traditional CoT prompting method in multiple reasoning benchmark tests. For example: - On the GSM8k dataset, the accuracy rate of Llama3 8b Instruct after using the steering vector is 79.15%, while the accuracy rate of using CoT prompts is 73.90%. - On the ARC AI2 - C dataset, the accuracy rate of Mistral 7b v0.2 Instruct after using the steering vector is 62.70%, while the accuracy rate of using CoT prompts is 60.75%. These results indicate that the steering vector method can not only effectively guide the model to perform Chain of Thought reasoning but also outperform the traditional prompting method in some cases. ### Conclusion This research shows that the steering vector is an effective and efficient means to guide the language model to perform Chain of Thought reasoning without sacrificing the overall model performance. This method provides new ideas for future research, especially in terms of improving the reasoning ability of language models.