Abstract:Knowledge graph embedding (KGE) that maps entities and relations into vector representations is essential for downstream applications. Conventional KGE methods require high-dimensional representations to learn the complex structure of knowledge graph, but lead to oversized model parameters. Recent advances reduce parameters by low-dimensional entity representations, while developing techniques (e.g., knowledge distillation or reinvented representation forms) to compensate for reduced dimension. However, such operations introduce complicated computations and model designs that may not benefit large knowledge graphs. To seek a simple strategy to improve the parameter efficiency of conventional KGE models, we take inspiration from that deeper neural networks require exponentially fewer parameters to achieve expressiveness comparable to wider networks for compositional structures. We view all entity representations as a single-layer embedding network, and conventional KGE methods that adopt high-dimensional entity representations equal widening the embedding network to gain expressiveness. To achieve parameter efficiency, we instead propose a deeper embedding network for entity representations, i.e., a narrow entity embedding layer plus a multi-layer dimension lifting network (LiftNet). Experiments on three public datasets show that by integrating LiftNet, four conventional KGE methods with 16-dimensional representations achieve comparable link prediction accuracy as original models that adopt 512-dimensional representations, saving 68.4% to 96.9% parameters.

Distilling Word Embeddings: An Encoding Approach

DistilE: Distiling Knowledge Graph Embeddings for Faster and Cheaper Reasoning

The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning

Knowledge-Powered Deep Learning for Word Embedding

Distilling the Knowledge in a Neural Network

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

Visualizing the embedding space to explain the effect of knowledge distillation

Unraveling Key Factors of Knowledge Distillation

Distilling Linguistic Context for Language Model Compression

Distilling Structured Knowledge into Embeddings for Explainable and Accurate Recommendation

Learning Effective Word Embedding Using Morphological Word Similarity

Word Equations: Inherently Interpretable Sparse Word Embeddingsthrough Sparse Coding

DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings

From Wide to Deep: Dimension Lifting Network for Parameter-efficient Knowledge Graph Embedding

Self-Distillation: Towards Efficient and Compact Neural Networks

Learning Better Word Embedding by Asymmetric Low-Rank Projection of Knowledge Graph

Compressing Neural Language Models by Sparse Word Representations

Distilling Holistic Knowledge with Graph Neural Networks

Transferable and Differentiable Discrete Network Embedding for multi-domains with Hierarchical Knowledge Distillation