Abstract:Knowledge Graph Embedding models, representing entities and edges in a low-dimensional space, have been extremely successful at solving tasks related to completing and exploring Knowledge Graphs (KGs). One of the key aspects of training most of these models is teaching to discriminate between true statements positives and false ones (negatives). However, the way in which negatives can be defined is not trivial, as facts missing from the KG are not necessarily false and a set of ground truth negatives is hardly ever given. This makes synthetic negative generation a necessity. Different generation strategies can heavily affect the quality of the embeddings, making it a primary aspect to consider. We revamp a strategy that generates corruptions during training respecting the domain and range of relations, we extend its capabilities and we show our methods bring substantial improvement (+10% MRR) for standard benchmark datasets and over +150% MRR for a larger ontology-backed dataset.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve a key problem in the training of Knowledge Graph Embedding (KGE) models: **How to generate high - quality synthetic negatives**. Specifically, KGE models complete and explore Knowledge Graphs (KGs) by learning low - dimensional representations of entities and relations. When training these models, a core task is to teach the model to distinguish between true statements (positive samples) and false statements (negative samples). However, generating appropriate negative samples is not easy because: 1. **Missing facts are not necessarily false**: Facts not included in the KG are not necessarily wrong. 2. **Lack of a real negative sample set**: Usually, there is no truly labeled negative sample provided. These problems make **the generation of synthetic negative samples** an indispensable part of the training process. Different negative sample generation strategies will significantly affect the embedding quality, so it is crucial to choose the appropriate strategy. To solve the above problems, this paper proposes a new negative sample generation method, namely **the domain - and range - aware synthetic negative sample generation method**. This method takes into account the domain and range of relations when generating negative samples, thereby improving the quality and diversity of negative samples. Experimental results show that this method brings significant performance improvements on standard benchmark datasets (the Mean Reciprocal Rank (MRR) is increased by about 10%), and on larger - scale ontology - supported datasets, the MRR is increased by more than 150%. ### Specific improvement points - **Combined with random uniform sampling**: To avoid the problem of repeated sampling caused by too few instances in some categories, the author combines domain - and range - based negative sample generation with random uniform sampling to ensure the diversity and effectiveness of negative samples. - **Applicable to different types of KGs**: This method is not only applicable to standard benchmark datasets, but also particularly applicable to biological datasets with a clear ontology structure (such as Hetionet). In these datasets, ontology - defined classes can generate more meaningful and diverse negative samples. - **Reduce computational overhead**: Compared with other complex negative sample generation methods, the method proposed in this paper has very little computational overhead and can maintain an efficient training process while ensuring performance improvement. In conclusion, this paper significantly improves the performance of KGE models by improving the negative sample generation strategy and provides a valuable reference for subsequent research.

Domain and Range Aware Synthetic Negatives Generation for Knowledge Graph Embedding Models

Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks

Treat Different Negatives Differently: Enriching Loss Functions with Domain and Range Constraints for Link Prediction

Efficient Non-Sampling Knowledge Graph Embedding

Generative Knowledge Graph Construction: A Review

A Spatially Constraint Negative Sample Generation Method for Geographic Knowledge Graph Embedding

Diversified and Adaptive Negative Sampling on Knowledge Graphs

KG-NSF: Knowledge Graph Completion with a Negative-Sample-Free Approach

Incorporating Domain and Range of Relations for Knowledge Graph Completion.

Biomedical Knowledge Graph Embeddings with Negative Statements

Negative Sampling in Knowledge Graph Representation Learning: A Review

Learning Structured Embeddings of Knowledge Graphs with Generative Adversarial Framework

Entity Aware Negative Sampling with Auxiliary Loss of False Negative Prediction for Knowledge Graph Embedding

NegatER: Unsupervised Discovery of Negatives in Commonsense Knowledge Bases

NeuralKG: an Open Source Library for Diverse Representation Learning of Knowledge Graphs

Meta-Knowledge Transfer for Inductive Knowledge Graph Embedding

Knowledge Graph Embedding with Diversity of Structures

Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings

Seq2KG: An End-to-End Neural Model for Domain Agnostic Knowledge Graph (not Text Graph) Construction from Text

Analysis of the Impact of Negative Sampling on Link Prediction in Knowledge Graphs

Universal Knowledge Graph Embedding Framework Based on High-Quality Negative Sampling and Weighting