Abstract:Knowledge graphs, which consist of entities and their relations, have become a popular way to store structured knowledge. Knowledge graph embedding (KGE), which derives a representation for each entity and relation, has been widely used to capture the semantics of the information in the knowledge graphs, and has demonstrated great success in many downstream applications, such as the extraction of similar entities in response to a query entity. However, existing KGE methods cannot work well on emerging knowledge graphs that are large-scale due to the constraints in storage and inference efficiency. In this paper, we propose a lightweight KGE model, LightKG, which significantly reduces storage as well as running time needed for inference. Instead of storing a continuous vector for every entity, LightKG only needs to store a few codebooks, each of which contains some codewords that correspond to the representatives among the embeddings, and the indices that correspond to the codeword selections for entities. Hence LightKG can achieve highly efficient storage. The efficiency of the downstream querying process can be significantly boosted too with the proposed LightKG model as the relevance score between the query and an entity can be efficiently calculated via a quick look-up in a table that contains the scores between the query and codewords. The storage and inference efficiency of LightKG is achieved by its novel design. LightKG is an end-to-end framework that automatically infers codebooks and codewords and generates an approximated embedding for each entity. A residual module is included in LightKG to induce the diversity among codebooks, and a continuous function is adopted to approximate codeword selection, which is non-differential. In addition, to further improve the performance of KGE, we propose a novel dynamic negative sampling method based on quantization, which can be applied to the proposed LightKG or other KGE methods. We conduct extensive experiments on five public datasets. The experiments show that LightKG is search and memory efficient with high approximate search accuracy. Also, the dynamic negative sampling can dramatically improve model performance with over 19% improvement on average.

HET-KG: Communication-Efficient Knowledge Graph Embedding Training Via Hotness-Aware Cache

Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding.

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

DGL-KE: Training Knowledge Graph Embeddings at Scale

Efficiently Embedding Dynamic Knowledge Graphs

Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis

Efficient Hyper-parameter Search for Knowledge Graph Embedding

HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training

Meta-Knowledge Transfer for Inductive Knowledge Graph Embedding

Federated Knowledge Graph Completion Via Embedding-Contrastive Learning

A data-centric framework of improving graph neural networks for knowledge graph embedding

A Lightweight Knowledge Graph Embedding Framework for Efficient Inference and Storage

Efficient Parallel Translating Embedding For Knowledge Graphs

CogKGE: A Knowledge Graph Embedding Toolkit and Benchmark for Representing Multi-source and Heterogeneous Knowledge

Efficient Non-Sampling Knowledge Graph Embedding

Temporal Knowledge Graph Embedding Via Sparse Transfer Matrix

Hardware-agnostic computation for large-scale knowledge graph embeddings

SSKGE: a time-saving knowledge graph embedding framework based on structure enhancement and semantic guidance

Knowledge Graph Construction of High-Performance Computing Learning Platform

HyCubE: Efficient Knowledge Hypergraph 3D Circular Convolutional Embedding

PIE: a Parameter and Inference Efficient Solution for Large Scale Knowledge Graph Embedding Reasoning