Abstract:In the current digitalization era, capturing and effectively representing knowledge is crucial in most real-world scenarios. In this context, knowledge graphs represent a potent tool for retrieving and organizing a vast amount of information in a properly interconnected and interpretable structure. However, their generation is still challenging and often requires considerable human effort and domain expertise, hampering the scalability and flexibility across different application fields. This paper proposes an innovative knowledge graph generation approach that leverages the potential of the latest generative large language models, such as GPT-3.5, that can address all the main critical issues in knowledge graph building. The approach is conveyed in a pipeline that comprises novel iterative zero-shot and external knowledge-agnostic strategies in the main stages of the generation process. Our unique manifold approach may encompass significant benefits to the scientific community. In particular, the main contribution can be summarized by: (i) an innovative strategy for iteratively prompting large language models to extract relevant components of the final graph; (ii) a zero-shot strategy for each prompt, meaning that there is no need for providing examples for "guiding" the prompt result; (iii) a scalable solution, as the adoption of LLMs avoids the need for any external resources or human expertise. To assess the effectiveness of our proposed model, we performed experiments on a dataset that covered a specific domain. We claim that our proposal is a suitable solution for scalable and versatile knowledge graph construction and may be applied to different and novel contexts.

What problem does this paper attempt to address?

The paper attempts to address several key issues in the construction of Knowledge Graphs (KG), including but not limited to data acquisition, data quality, scalability, subjectivity and contextual knowledge, semantic disambiguation, domain expertise requirements, and dependency on external resources. Specifically: 1. **Data Acquisition and Quality**: How to effectively extract, analyze, and enhance information from various text data sources while ensuring the quality of the extracted information, avoiding issues caused by incorrect or outdated information, incomplete or missing data, unreliable sources, or contradictory data. 2. **Scalability**: How to define effective strategies to generate knowledge graphs containing millions or even billions of entities and relationships when dealing with large-scale datasets. 3. **Semantic Disambiguation**: How to perform appropriate word sense disambiguation, entity resolution, and linking to correctly represent knowledge. 4. **Domain Expertise**: How to generate high-quality knowledge graphs without the need for specific domain experts. 5. **Dependency on External Resources**: How to generate relevant and appropriate triples without relying on external knowledge bases or Open Information Extraction (OpenIE) methods. 6. **Evaluation**: How to properly evaluate the generated knowledge graphs in the absence of prior gold standards or specific benchmarks. To address the above issues, the authors propose an iterative zero-shot large language model (LLM) prompting method that can automatically complete the construction of knowledge graphs without relying on any external knowledge bases or human effort. The specific contributions include: 1. **Iterative LLM Prompting Pipeline**: Through a series of carefully designed prompts, the LLM can automatically identify relevant entities, extract descriptions and types, identify meaningful relationships and their descriptions, and generate relevant triples. 2. **Zero-Shot Approach**: All designed prompts do not require examples or external knowledge bases to infer relevant information. 3. **Automated Entity/Predicate Resolution**: Reliably resolve entities and predicates without relying on third-party resources. 4. **Large-Scale Data Processing Capability**: Since no human effort and example documents are needed, this method can handle large-scale data. 5. **Evaluation Method**: Manually construct benchmarks using the results of multiple prompts to apply additional evaluation metrics. In summary, the paper proposes an innovative, fully automated method for constructing knowledge graphs, aiming to improve the efficiency and quality of knowledge graph construction while reducing dependence on human intervention and external resources.

Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction

Prompt-Time Symbolic Knowledge Capture with Large Language Models

KnowGPT: Knowledge Graph based Prompting for Large Language Models

Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering

Generative Knowledge Graph Construction: A Review

Process Knowledge Extraction and Knowledge Graph Construction Through Prompting: A Quantitative Analysis

Graph Neural Prompting with Large Language Models

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs

Assessing LLMs Suitability for Knowledge Graph Completion

Prompting as Probing: Using Language Models for Knowledge Base Construction

Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs

Biomedical knowledge graph-optimized prompt generation for large language models

Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

Knowledge Graph Prompting for Multi-Document Question Answering

Generating Knowledge Graphs from Large Language Models: A Comparative Study of GPT-4, LLaMA 2, and BERT

Can LLMs be Good Graph Judger for Knowledge Graph Construction?

iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models

KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion