Abstract:Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc., infer missing links using only the knowledge from training data. In contrast, the recent Pre-trained Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training. Therefore, PLM-based KGC can estimate missing links between entities by reusing memorized knowledge from pre-training without inference. This approach is problematic because building KGC models aims to infer unseen links between entities. However, conventional evaluations in KGC do not consider inference and memorization abilities separately. Thus, a PLM-based KGC method, which achieves high performance in current KGC evaluations, may be ineffective in practical applications. To address this issue, we analyze whether PLM-based KGC methods make inferences or merely access memorized knowledge. For this purpose, we propose a method for constructing synthetic datasets specified in this analysis and conclude that PLMs acquire the inference abilities required for KGC through pre-training, even though the performance improvements mostly come from textual information of entities and relations.

What problem does this paper attempt to address?

The paper primarily explores the performance of Pre-trained Language Models (PLMs) in the task of Knowledge Graph Completion (KGC). Specifically, the study focuses on whether PLMs truly perform reasoning on unknown links or merely rely on the memorized knowledge obtained during pre-training. ### Research Background Traditional KGC methods mainly rely on the structural information of the knowledge graph itself for link prediction, while recent studies have utilized PLMs to enhance KGC capabilities. These methods can leverage the knowledge learned during pre-training to improve performance. However, this brings up a new issue: are these methods predicting unknown links based on reasoning abilities, or are they simply reusing the knowledge learned during the pre-training phase? ### Research Objectives The goal of the paper is to analyze whether PLM-based KGC methods are genuinely performing reasoning or merely relying on memory. To achieve this goal, the authors propose a method to construct synthetic datasets that allow researchers to independently evaluate the model's ability to reuse pre-trained knowledge and perform reasoning in KGC tasks. ### Main Contributions - **Synthetic Dataset Construction**: The paper details several methods for creating synthetic datasets, including settings like virtual worlds, anonymous entities, inconsistent descriptions, and fully anonymous environments. These methods construct different dataset environments by altering the names of entities and relations or replacing their textual descriptions. - **Experimental Design**: WN18RR, FB15k-237, and Wikidata5m were used as benchmark datasets, and several typical discriminative and generative PLM-based KGC methods were applied for comparison. - **Results Analysis**: The results show that PLM-based KGC methods do possess the reasoning ability required for KGC, but they rely more on the textual information of entities and relations. Additionally, when pre-training information is removed, the performance of PLM-based methods is comparable to or lower than traditional methods, highlighting the importance of the complementarity between the two approaches. In summary, through carefully designed experiments, the paper reveals the actual working principles of PLM-based KGC methods in KGC tasks and suggests directions for further exploration to enhance their reasoning capabilities.

Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?

Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reasonable Approach.

Simple knowledge graph completion model based on PU learning and prompt learning

Graph Structure Enhanced Pre-Training Language Model for Knowledge Graph Completion

Knowledge-Infused Pre-trained Models for KG Completion

Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints

Step out of KG: Knowledge Graph Completion via Knowledgeable Retrieval and Reading Comprehension

Progressive Knowledge Graph Completion

In-Context Learning with Topological Information for Knowledge Graph Completion

Knowledge graph extension with a pre-trained language model via unified learning method

MEGA: Meta-Graph Augmented Pre-Training Model for Knowledge Graph Completion

KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation

Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language Models

Link Prediction using Embedded Knowledge Graphs

Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding

KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion

Knowledge Graph Completing with Dual Confrontation Learning Model Based on Variational Information Bottleneck Method

SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models

A Pre-training Framework for Knowledge Graph Completion