Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

Qingru Zhang,Xiaodong Yu,Chandan Singh,Xiaodong Liu,Liyuan Liu,Jianfeng Gao,Tuo Zhao,Dan Roth,Hao Cheng
2024-09-17
Abstract:Large language models (LLMs) have demonstrated remarkable performance across various real-world tasks. However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated. This difficulty increases for contexts that are long or contain distracting information, which can divert LLMs from fully capturing essential evidence. To address this issue, many works use prompting to help LLMs utilize contextual information more faithfully. For instance, iterative prompting highlights key information in two steps that first ask the LLM to identify important pieces of context and then derive answers accordingly. However, prompting methods are constrained to highlighting key information implicitly in token space, which is often insufficient to fully steer the model's attention. To improve model faithfulness more reliably, we propose AutoPASTA, a method that automatically identifies key contextual information and explicitly highlights it by steering an LLM's attention scores. Like prompting, AutoPASTA is applied at inference time and does not require changing any model parameters. Our experiments on open-book QA demonstrate that AutoPASTA effectively enables models to grasp essential contextual information, leading to substantially improved model faithfulness and performance, e.g., an average improvement of 7.95% for LLAMA3-70B-Instruct. Code will be publicly available at <a class="link-external link-https" href="https://github.com/QingruZhang/AutoPASTA" rel="external noopener nofollow">this https URL</a> .
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the issue of unfaithfulness in large language models (LLMs) when processing input context. Specifically, although LLMs perform excellently in various practical tasks, they often struggle to fully understand and effectively utilize input context, leading to inaccurate or hallucinated responses. This problem is particularly pronounced in long texts or contexts containing distracting information, as these factors can prevent LLMs from adequately capturing key evidence. To tackle this challenge, many studies have attempted to help LLMs more faithfully utilize contextual information through prompting. For example, iterative prompting methods highlight key information through a two-step process: first, the LLM is asked to identify important context fragments, and then generate answers based on these fragments. However, these methods usually only implicitly highlight key information in the token space, which is often insufficient to fully guide the model's attention. To this end, the paper proposes the AutoPASTA method, which can automatically identify key contextual information and highlight it by explicitly adjusting the LLM's attention scores. Unlike existing prompting methods, AutoPASTA is applied during the inference stage without requiring changes to the model parameters. Experimental results show that AutoPASTA significantly improves the model's faithfulness and performance in open-book question answering tasks, such as an average performance improvement of 7.95% on the LLAMA3-70B-Instruct model. In summary, this paper aims to improve the faithfulness and performance of large language models in handling complex contexts by automatically identifying and explicitly highlighting key contextual information.