LLM2KB: Constructing Knowledge Bases using instruction tuned context aware Large Language Models

Anmol Nayak,Hari Prasad Timmapathini

2023-08-25

Abstract:The advent of Large Language Models (LLM) has revolutionized the field of natural language processing, enabling significant progress in various applications. One key area of interest is the construction of Knowledge Bases (KB) using these powerful models. Knowledge bases serve as repositories of structured information, facilitating information retrieval and inference tasks. Our paper proposes LLM2KB, a system for constructing knowledge bases using large language models, with a focus on the Llama 2 architecture and the Wikipedia dataset. We perform parameter efficient instruction tuning for Llama-2-13b-chat and StableBeluga-13B by training small injection models that have only 0.05 % of the parameters of the base models using the Low Rank Adaptation (LoRA) technique. These injection models have been trained with prompts that are engineered to utilize Wikipedia page contexts of subject entities fetched using a Dense Passage Retrieval (DPR) algorithm, to answer relevant object entities for a given subject entity and relation. Our best performing model achieved an average F1 score of 0.6185 across 21 relations in the LM-KBC challenge held at the ISWC 2023 conference.

Computation and Language

What problem does this paper attempt to address?

The main objective of this paper is to propose a method for constructing a Knowledge Base (KB) using large language models (LLM), specifically for the Track 2 task of the LM-KBC 2023 challenge. The research team developed a system named LLM2KB, which focuses on using the Llama 2 architecture and the Wikipedia dataset to build the knowledge base. Specifically, the researchers used two large language models as the base models: Llama-2-13b-chat and StableBeluga-13B, and employed parameter-efficient instruction tuning techniques (such as LoRA) to fine-tune these models. In this way, the researchers aim to enable the models to predict all correct object-entities based on given subject-entities and relations. To enhance performance, they also utilized the Dense Passage Retrieval (DPR) algorithm to retrieve contextual information from Wikipedia pages, assisting the models in better understanding the background knowledge of the subject-entities. Experimental results show that the best-performing model achieved an average F1 score of 0.6185 in the LM-KBC 2023 challenge, indicating that this method can effectively predict object-entities across various relations. In summary, this paper focuses on introducing a novel approach to automatically constructing a knowledge base using large language models, with innovative attempts particularly in leveraging Wikipedia data and instruction tuning techniques.

LLM2KB: Constructing Knowledge Bases using instruction tuned context aware Large Language Models

KBLaM: Knowledge Base augmented Language Model

Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata

Knowledge Bases in Support of Large Language Models for Processing Web News

Supervised Knowledge Makes Large Language Models Better In-context Learners

Evaluating Language Models for Knowledge Base Completion

Large language model based framework for knowledgebase coverage and correctness using chatbot and human feedback

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

Large Language Models with Controllable Working Memory

Can Language Models Act as Knowledge Bases at Scale?

Augmented Large Language Models with Parametric Knowledge Guiding

Knowledge Graph-Enhanced Large Language Models via Path Selection

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases

Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

Knowledge Editing for Large Language Models: A Survey

Large Knowledge Model: Perspectives and Challenges

Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion

Language Models As or For Knowledge Bases

How Reliable are LLMs as Knowledge Bases? Re-thinking Facutality and Consistency