Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

Peter Hase,Thomas Hofweber,Xiang Zhou,Elias Stengel-Eskin,Mohit Bansal
2024-06-28
Abstract:The model editing problem concerns how language models should learn new facts about the world over time. While empirical research on model editing has drawn widespread attention, the conceptual foundations of model editing remain shaky -- perhaps unsurprisingly, since model editing is essentially belief revision, a storied problem in philosophy that has eluded succinct solutions for decades. Model editing nonetheless demands a solution, since we need to be able to control the knowledge within language models. With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research. We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place. Many of these challenges are extremely difficult to address, e.g. determining far-reaching consequences of edits, labeling probabilistic entailments between facts, and updating beliefs of agent simulators. Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent. This enables us to say exactly how belief revision in language models falls short of a desirable epistemic standard. We encourage further research exploring settings where such a gold standard can be compared against. Our code is publicly available at: <a class="link-external link-https" href="https://github.com/peterbhase/LLM-belief-revision" rel="external noopener nofollow">this https URL</a>
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily focuses on the issue of model editing in large language models (LLMs) and attempts to address the following core problems: 1. **Conceptual Challenges**: The conceptual difficulties encountered when defining the model editing problem, including issues with background beliefs, the problem of multiple possible worlds, the challenge of complete corrigibility, the issue of missing context, and the cost of consistency. - **Background Belief Problem**: The expected behavior of LLMs after updating knowledge depends on their prior beliefs, making it complex to evaluate the rational conclusions of LLMs. - **Multiple Possible Worlds Problem**: New information often implies multiple possible world states, but determining the most likely real-world state is challenging. - **Complete Corrigibility Problem**: The desire for LLMs to accept any belief update can lead to unpredictable consequences. - **Missing Context Problem**: Model editing often lacks conversational or physical context, making it difficult to interpret the updated content. - **Consistency Cost Problem**: The cost associated with maintaining belief consistency. 2. **Benchmark Development Challenges**: The practical difficulties encountered when constructing datasets for evaluating model editing, such as the complexity of annotating factual entailment, the ambiguity of factual statements, and the need for targeted testing strategies for error correction. - **Factual Entailment Annotation Difficulty**: Annotating the probabilistic relationships between facts is very complex and hard to standardize. - **Ambiguous Factual Statements**: Many common factual statements are highly imprecise, increasing the difficulty of annotation. - **Error Correction Strategy**: Specific details are needed on which errors to correct and which models are expected to have these errors. 3. **Challenges of Assuming Editable Beliefs in LLMs**: The issue of whether current LLMs truly possess editable beliefs and how to manipulate the confidence related to those beliefs. - **LLMs as Agents or Agent Simulators**: Do LLMs express a single set of beliefs or different agent beliefs in different contexts? - **LLMs as Agents or Databases**: Do LLMs maintain consistent beliefs, or do they merely act as passive data containers? - **No Learned Belief Update Mechanism**: Why does minimal supervised fine-tuning correspond to the existing belief revision process? - **How to Edit Confidence**: LLMs exhibit uncertainty in language expression in various ways; which method should be utilized in the editing process? To address these issues, the paper proposes a semi-synthetic dataset generated based on Wikidata, and evaluates the effects of model editing through an idealized Bayesian agent. This approach aims to more precisely quantify the shortcomings of model editing and provides a more formal starting point for future research.