Abstract:Knowledge graphs (KGs) are collections of real-world knowledge that is represented by a structured form of triples. Since they are manually built in their nascent stage, there is a common problem that some links (triples) are missing. Knowledge graph completion (KGC) aims to find those missing links and thereby complete the KGs. However, as knowledge increases through diverse sources, new entities have explosively emerged and they are needed to be connected to existing KGs. Thus, open-world KGC is targeted on extending KGs to those new entities. Dealing with those new entities is challenging because they do not have any connection with entities in the existing KGs. One way to handle the new ones is to embed them with their textual descriptions with pre-trained word embeddings and score them in the graph-vector space with the existing typical KGC models. These models have resulted in meaningful results but there is still a lack of studies on utilizing the latest neural networks, such as pre-trained language models which are known to be better at capturing contexts than pre-trained word embeddings. This paper proposes a novel model that effectively connects new entities and existing KGs through a pre-trained language model. To effectively handle the problem, we utilize two learning methods; one is the classification method of the masked language model (MLM) that predicts a word among a huge vocabulary set with a given context, and the other is multi-task learning based on the Multi-Task for Deep Neural Networks (MT-DNN). Based on the methods, the model first generates an embedding of a new entity using its textual description and then uses the embedding to find one of the existing entities from a KG where the new entity can be connected. The experimental results on three benchmark datasets, DBPedia50k, FB15k-237-OWE, and FB20k, show that the proposed model improves performances by 9.2%p , 4.4%p , and 11.1%p , respectively, and achieves new state-of-the-art performance for all datasets.

HORNET - Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources.

Revisiting and Advancing Chinese Natural Language Understanding with Accelerated Heterogeneous Knowledge Pre-training

KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning

DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding

Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models

NeuralKG: an Open Source Library for Diverse Representation Learning of Knowledge Graphs

BertNet: Harvesting Knowledge Graphs from Pretrained Language Models

Towards Knowledge Enhanced Language Model for Machine Reading Comprehension

Knowledge Graph Embedding with Hierarchical Relation Structure

Knowledge graph embedding model with attention-based high-low level features interaction convolutional network

Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation

Towards Robust Knowledge Graph Embedding via Multi-task Reinforcement Learning

Knowledge graph extension with a pre-trained language model via unified learning method

A Survey on Knowledge-Enhanced Pre-trained Language Models

Knowledge-Guided Heterogeneous Graph Convolutional Network for Aspect-Based Sentiment Analysis

A Survey of Knowledge Enhanced Pre-trained Language Models

KGNER: Improving Chinese Named Entity Recognition by BERT Infused with the Knowledge Graph