Leveraging LLM for Automated Ontology Extraction and Knowledge Graph Generation

Mohammad Sadeq Abolhasani,Rong Pan
2024-12-01
Abstract:Extracting relevant and structured knowledge from large, complex technical documents within the Reliability and Maintainability (RAM) domain is labor-intensive and prone to errors. Our work addresses this challenge by presenting OntoKGen, a genuine pipeline for ontology extraction and Knowledge Graph (KG) generation. OntoKGen leverages Large Language Models (LLMs) through an interactive user interface guided by our adaptive iterative Chain of Thought (CoT) algorithm to ensure that the ontology extraction process and, thus, KG generation align with user-specific requirements. Although KG generation follows a clear, structured path based on the confirmed ontology, there is no universally correct ontology as it is inherently based on the user's preferences. OntoKGen recommends an ontology grounded in best practices, minimizing user effort and providing valuable insights that may have been overlooked, all while giving the user complete control over the final ontology. Having generated the KG based on the confirmed ontology, OntoKGen enables seamless integration into schemeless, non-relational databases like Neo4j. This integration allows for flexible storage and retrieval of knowledge from diverse, unstructured sources, facilitating advanced querying, analysis, and decision-making. Moreover, the generated KG serves as a robust foundation for future integration into Retrieval Augmented Generation (RAG) systems, offering enhanced capabilities for developing domain-specific intelligent applications.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the field of Reliability and Maintainability (RAM), the process of extracting relevant and structured knowledge from a large number of complex technical documents is both time - consuming and error - prone. Specifically, engineers need to quickly access and locate specific information, which may be buried in a large number of technical manuals. In addition, they also need tools to visualize data, discover hidden relationships, and perform reasoning to support decision - making. This highlights the importance of developing an efficient automated system that can extract, organize, and utilize this knowledge for various analytical purposes. The paper proposes a solution named OntoKGen, which is a pipeline for automated ontology extraction and knowledge graph generation based on large - language models (LLMs). Through an interactive user interface and an adaptive iterative Chain of Thought (CoT) algorithm, OntoKGen ensures that the ontology extraction process and knowledge graph generation meet the specific needs of users. This system can reduce the labor intensity of users, provide valuable insights, and allow users to have full control over the final ontology structure. The generated knowledge graph can be seamlessly integrated into a schema - free non - relational database such as Neo4j, thereby achieving flexible knowledge storage and retrieval and promoting advanced querying, analysis, and decision - making.