Abstract:Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers six common reasoning schemes in real world. We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. We found that all model editing methods show notably low performance on this dataset, especially in certain reasoning schemes. Our analysis over the chain-of-thought generation of edited models further uncover key reasons behind the inadequacy of existing knowledge editing methods from a reasoning standpoint, involving aspects on fact-wise editing, fact recall ability, and coherence in generation. We will make our benchmark publicly available.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: when updating interconnected facts, current knowledge editing methods are unable to effectively disseminate the updated knowledge, resulting in obstacles for the model in performing accurate reasoning. Specifically, the authors found that existing knowledge editing techniques (such as input enhancement, fine - tuning, and location - based editing) perform poorly when handling tasks that require reasoning, especially with a significant performance decline in certain reasoning patterns. To further investigate this problem, the authors introduced a new reasoning - based benchmark dataset - ReCoE (Reasoning - based Counterfactual Editing dataset), which covers six common reasoning patterns: superlative, comparative, ranking, counting, aggregation, and subtraction. Through this dataset, the authors conducted a comprehensive analysis of existing knowledge editing methods, revealing the deficiencies of these methods in fact editing, fact recall, and generation coherence. The following are the main contributions of the paper: 1. **Introduced a reasoning - based knowledge editing evaluation framework**: It covers key aspects to enable effective reasoning. The analysis reveals the challenges and limitations related to knowledge dissemination. 2. **Proposed the ReCoE dataset**: This is a novel and challenging reasoning - based counterfactual editing benchmark, containing diverse reasoning patterns and being closer to real - world scenarios. Through these efforts, the authors hope to provide valuable insights for future model editing techniques and guide their development directions. ### Formula Representation In the paper, the authors used some formulas to describe the probability distribution in the reasoning process. For example, the edited language model \(P'(CA|Q)\) can be decomposed into two parts: \[P'(CA|Q) = P'(CF|Q)\cdot P'(CA|Q, CF)\] where: - \(P'(CF|Q)\) represents the probability that the model recalls the counterfactual \(CF\) according to the question \(Q\). - \(P'(CA|Q, CF)\) represents the probability that the model generates the correct answer \(CA\) given the question \(Q\) and the counterfactual \(CF\). These two parts correspond to the abilities of fact recall and coherent generation respectively. ### Summary This paper, through introducing the ReCoE dataset and the reasoning - based evaluation framework, deeply analyzed the deficiencies of existing knowledge editing methods in disseminating updated knowledge, especially the performance of these methods in different reasoning patterns. The authors hope that through these studies, they can provide theoretical basis and practical guidance for improving knowledge editing techniques in the future.

Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks

AKEW: Assessing Knowledge Editing in the Wild

EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries

Pioneering Reliable Assessment in Text-to-Image Knowledge Editing: Leveraging a Fine-Grained Dataset and an Innovative Criterion

Knowledge Editing through Chain-of-Thought

DeepEdit: Knowledge Editing as Decoding with Constraints

Joint Knowledge Editing for Information Enrichment and Probability Promotion

Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks

Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework

Uncovering Overfitting in Large Language Model Editing

Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge

Unveiling the Pitfalls of Knowledge Editing for Large Language Models

How Well Can Knowledge Edit Methods Edit Perplexing Knowledge?

Keys to Robust Edits: from Theoretical Insights to Practical Advances

ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing

Evaluating the Ripple Effects of Knowledge Editing in Language Models

Untying the Reversal Curse via Bidirectional Language Model Editing