Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

Qingru Zhang,Xiaodong Yu,Chandan Singh,Xiaodong Liu,Liyuan Liu,Jianfeng Gao,Tuo Zhao,Dan Roth,Hao Cheng

2024-09-17

Abstract:Large language models (LLMs) have demonstrated remarkable performance across various real-world tasks. However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated. This difficulty increases for contexts that are long or contain distracting information, which can divert LLMs from fully capturing essential evidence. To address this issue, many works use prompting to help LLMs utilize contextual information more faithfully. For instance, iterative prompting highlights key information in two steps that first ask the LLM to identify important pieces of context and then derive answers accordingly. However, prompting methods are constrained to highlighting key information implicitly in token space, which is often insufficient to fully steer the model's attention. To improve model faithfulness more reliably, we propose AutoPASTA, a method that automatically identifies key contextual information and explicitly highlights it by steering an LLM's attention scores. Like prompting, AutoPASTA is applied at inference time and does not require changing any model parameters. Our experiments on open-book QA demonstrate that AutoPASTA effectively enables models to grasp essential contextual information, leading to substantially improved model faithfulness and performance, e.g., an average improvement of 7.95% for LLAMA3-70B-Instruct. Code will be publicly available at <a class="link-external link-https" href="https://github.com/QingruZhang/AutoPASTA" rel="external noopener nofollow">this https URL</a> .

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

This paper attempts to address the issue of unfaithfulness in large language models (LLMs) when processing input context. Specifically, although LLMs perform excellently in various practical tasks, they often struggle to fully understand and effectively utilize input context, leading to inaccurate or hallucinated responses. This problem is particularly pronounced in long texts or contexts containing distracting information, as these factors can prevent LLMs from adequately capturing key evidence. To tackle this challenge, many studies have attempted to help LLMs more faithfully utilize contextual information through prompting. For example, iterative prompting methods highlight key information through a two-step process: first, the LLM is asked to identify important context fragments, and then generate answers based on these fragments. However, these methods usually only implicitly highlight key information in the token space, which is often insufficient to fully guide the model's attention. To this end, the paper proposes the AutoPASTA method, which can automatically identify key contextual information and highlight it by explicitly adjusting the LLM's attention scores. Unlike existing prompting methods, AutoPASTA is applied during the inference stage without requiring changes to the model parameters. Experimental results show that AutoPASTA significantly improves the model's faithfulness and performance in open-book question answering tasks, such as an average performance improvement of 7.95% on the LLAMA3-70B-Instruct model. In summary, this paper aims to improve the faithfulness and performance of large language models in handling complex contexts by automatically identifying and explicitly highlighting key contextual information.

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Enhancing Large Language Models' Situated Faithfulness to External Contexts

Context-faithful Prompting for Large Language Models

FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"

Context Matter: Data-Efficient Augmentation of Large Language Models for Scientific Applications

Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style

Evaluating Human Alignment and Model Faithfulness of LLM Rationale

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Context-DPO: Aligning Language Models for Context-Faithfulness

Advancing Large Language Model Attribution through Self-Improving

Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference

Large Language Models Can Self-Improve in Long-context Reasoning

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts

Introspective Tips: Large Language Model for In-Context Decision Making

Attribute or Abstain: Large Language Models as Long Document Assistants