Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction

Salvatore Carta,Alessandro Giuliani,Leonardo Piano,Alessandro Sebastian Podda,Livio Pompianu,Sandro Gabriele Tiddia
2023-07-04
Abstract:In the current digitalization era, capturing and effectively representing knowledge is crucial in most real-world scenarios. In this context, knowledge graphs represent a potent tool for retrieving and organizing a vast amount of information in a properly interconnected and interpretable structure. However, their generation is still challenging and often requires considerable human effort and domain expertise, hampering the scalability and flexibility across different application fields. This paper proposes an innovative knowledge graph generation approach that leverages the potential of the latest generative large language models, such as GPT-3.5, that can address all the main critical issues in knowledge graph building. The approach is conveyed in a pipeline that comprises novel iterative zero-shot and external knowledge-agnostic strategies in the main stages of the generation process. Our unique manifold approach may encompass significant benefits to the scientific community. In particular, the main contribution can be summarized by: (i) an innovative strategy for iteratively prompting large language models to extract relevant components of the final graph; (ii) a zero-shot strategy for each prompt, meaning that there is no need for providing examples for "guiding" the prompt result; (iii) a scalable solution, as the adoption of LLMs avoids the need for any external resources or human expertise. To assess the effectiveness of our proposed model, we performed experiments on a dataset that covered a specific domain. We claim that our proposal is a suitable solution for scalable and versatile knowledge graph construction and may be applied to different and novel contexts.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address several key issues in the construction of Knowledge Graphs (KG), including but not limited to data acquisition, data quality, scalability, subjectivity and contextual knowledge, semantic disambiguation, domain expertise requirements, and dependency on external resources. Specifically: 1. **Data Acquisition and Quality**: How to effectively extract, analyze, and enhance information from various text data sources while ensuring the quality of the extracted information, avoiding issues caused by incorrect or outdated information, incomplete or missing data, unreliable sources, or contradictory data. 2. **Scalability**: How to define effective strategies to generate knowledge graphs containing millions or even billions of entities and relationships when dealing with large-scale datasets. 3. **Semantic Disambiguation**: How to perform appropriate word sense disambiguation, entity resolution, and linking to correctly represent knowledge. 4. **Domain Expertise**: How to generate high-quality knowledge graphs without the need for specific domain experts. 5. **Dependency on External Resources**: How to generate relevant and appropriate triples without relying on external knowledge bases or Open Information Extraction (OpenIE) methods. 6. **Evaluation**: How to properly evaluate the generated knowledge graphs in the absence of prior gold standards or specific benchmarks. To address the above issues, the authors propose an iterative zero-shot large language model (LLM) prompting method that can automatically complete the construction of knowledge graphs without relying on any external knowledge bases or human effort. The specific contributions include: 1. **Iterative LLM Prompting Pipeline**: Through a series of carefully designed prompts, the LLM can automatically identify relevant entities, extract descriptions and types, identify meaningful relationships and their descriptions, and generate relevant triples. 2. **Zero-Shot Approach**: All designed prompts do not require examples or external knowledge bases to infer relevant information. 3. **Automated Entity/Predicate Resolution**: Reliably resolve entities and predicates without relying on third-party resources. 4. **Large-Scale Data Processing Capability**: Since no human effort and example documents are needed, this method can handle large-scale data. 5. **Evaluation Method**: Manually construct benchmarks using the results of multiple prompts to apply additional evaluation metrics. In summary, the paper proposes an innovative, fully automated method for constructing knowledge graphs, aiming to improve the efficiency and quality of knowledge graph construction while reducing dependence on human intervention and external resources.