iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models

Yassir Lairgi,Ludovic Moncla,Rémy Cazabet,Khalid Benabdeslem,Pierre Cléau
2024-09-05
Abstract:Most available data is unstructured, making it challenging to access valuable information. Automatically building Knowledge Graphs (KGs) is crucial for structuring data and making it accessible, allowing users to search for information effectively. KGs also facilitate insights, inference, and reasoning. Traditional NLP methods, such as named entity recognition and relation extraction, are key in information retrieval but face limitations, including the use of predefined entity types and the need for supervised learning. Current research leverages large language models' capabilities, such as zero- or few-shot learning. However, unresolved and semantically duplicated entities and relations still pose challenges, leading to inconsistent graphs and requiring extensive post-processing. Additionally, most approaches are topic-dependent. In this paper, we propose iText2KG, a method for incremental, topic-independent KG construction without post-processing. This plug-and-play, zero-shot method is applicable across a wide range of KG construction scenarios and comprises four modules: Document Distiller, Incremental Entity Extractor, Incremental Relation Extractor, and Graph Integrator and Visualization. Our method demonstrates superior performance compared to baseline methods across three scenarios: converting scientific papers to graphs, websites to graphs, and CVs to graphs.
Artificial Intelligence,Computation and Language,Information Retrieval
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the problem of automatically constructing knowledge graphs (KGs) from unstructured text data. Specifically, it attempts to overcome the following challenges: 1. **Limitations of Traditional Methods**: - Traditional natural language processing (NLP) methods such as named entity recognition (NER) and relation extraction (RE) are limited by predefined entity types and relationships, and typically rely on supervised learning, which requires a large amount of manual annotation. 2. **Issues with Current Methods**: - Current methods based on large language models (LLMs) perform well in zero-shot or few-shot learning but still face unresolved issues of entity and relationship duplication, leading to inconsistent graphs and requiring extensive post-processing. Additionally, many methods are topic-specific and cannot be widely applied across different domains. 3. **Proposed New Method**: - A new method named iText2KG is proposed for incrementally constructing consistent knowledge graphs from raw documents without the need for post-processing steps. This method is a plug-and-play zero-shot approach suitable for a wide range of KG construction scenarios. iText2KG consists of four modules: - **Document Distiller**: Rewrites raw documents into semantic blocks. - **Incremental Entity Extractor**: Extracts and parses entities from the semantic blocks. - **Incremental Relation Extractor**: Detects relationships within the semantic blocks. - **Graph Integrator and Visualization**: Uses Neo4j to visualize these entities and relationships in graph format. In this way, iText2KG aims to improve the efficiency and accuracy of KG construction, reduce redundant information, and ensure the uniqueness and consistency of entities and relationships.