Customized Information and Domain-centric Knowledge Graph Construction with Large Language Models

Frank Wawrzik,Matthias Plaue,Savan Vekariya,Christoph Grimm
2024-09-30
Abstract:In this paper we propose a novel approach based on knowledge graphs to provide timely access to structured information, to enable actionable technology intelligence, and improve cyber-physical systems planning. Our framework encompasses a text mining process, which includes information retrieval, keyphrase extraction, semantic network creation, and topic map visualization. Following this data exploration process, we employ a selective knowledge graph construction (KGC) approach supported by an electronics and innovation ontology-backed pipeline for multi-objective decision-making with a focus on cyber-physical systems. We apply our methodology to the domain of automotive electrical systems to demonstrate the approach, which is scalable. Our results demonstrate that our construction process outperforms GraphGPT as well as our bi-LSTM and transformer REBEL with a pre-defined dataset by several times in terms of class recognition, relationship construction and correct "sublass of" categorization. Additionally, we outline reasoning applications and provide a comparison with Wikidata to show the differences and advantages of the approach.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are: How to provide timely and structured information access by constructing customized knowledge graphs, so as to enhance the operability of technical intelligence and improve the quality of Cyber - Physical Systems (CPS) planning. Specifically, the paper focuses on the following points: 1. **Providing timely and structured information access**: The planning of modern Cyber - Physical Systems requires rapid acquisition and processing of large amounts of data in order to make informed decisions. To this end, the authors propose a knowledge - graph - based approach aimed at providing timely access to structured information. 2. **Enhancing the operability of technical intelligence**: Technology Intelligence refers to evaluating and tracking the development potential and behavior of technology through data analysis. In order to make this intelligence more operable, the paper proposes a framework that utilizes large - language models (LLMs) for text mining, key - phrase extraction, semantic - network creation, and topic - map visualization. 3. **Improving the planning of Cyber - Physical Systems**: The complexity of Cyber - Physical Systems requires more efficient planning tools. The method proposed in the paper demonstrates its potential in improving CPS planning through multi - objective decision support, especially in the field of automotive electronics systems. 4. **Improving the construction efficiency and quality of knowledge graphs**: Traditional knowledge - graph - construction methods usually rely on static corpora in specific domains, which may result in knowledge graphs that are not concise or complete. To solve these problems, the paper introduces a new method. By pre - classifying information from heterogeneous sources, larger and more comprehensive knowledge graphs are constructed, and semantic accuracy is enhanced through reasoning techniques. ### Specific problem descriptions - **Limitations of traditional methods**: Existing knowledge - graph - construction methods often focus on specific domains and rely on dedicated corpora, which are usually static or updated with pre - defined attributes. This leads to deficiencies in the conciseness and completeness of knowledge graphs. - **Challenges in large - scale data processing**: Facing massive amounts of data, how to efficiently extract useful information and construct high - quality knowledge graphs is an important challenge. - **Requirements for cross - domain applications**: The requirements for constructing knowledge graphs in different domains vary, and a general and flexible method is needed to adapt to different application scenarios. ### Solutions in the paper The solutions proposed in the paper include the following aspects: - **Pre - classifying information**: Crawl a large amount of data from multiple heterogeneous sources and pre - classify it to ensure that the constructed knowledge graph is both relevant and comprehensive. - **Selective Knowledge Graph Construction (KGC)**: Through the support pipeline combining electronics and innovation ontologies, achieve multi - objective decision support, with particular attention to Cyber - Physical Systems. - **Application of reasoning techniques**: Improve semantic accuracy through reasoning techniques to ensure the consistency and structuring of the knowledge graph. - **Case - study verification**: The paper conducts a case study in the field of automotive electronics systems, demonstrating the effectiveness and superiority of this method, especially significantly outperforming GraphGPT and other models in class identification, relationship construction, and correct classification. Through these methods, the paper successfully solves the above - mentioned problems and provides a more powerful tool for technology and innovation intelligence, especially in the planning and decision - support of Cyber - Physical Systems.