Abstract:Aerospace manufacturing companies, such as Thales Alenia Space, design, develop, integrate, verify, and validate products characterized by high complexity and low volume. They carefully document all phases for each product but analyses across products are challenging due to the heterogeneity and unstructured nature of the data in documents. In this paper, we propose a hybrid methodology that leverages Knowledge Graphs (KGs) in conjunction with Large Language Models (LLMs) to extract and validate data contained in these documents. We consider a case study focused on test data related to electronic boards for satellites. To do so, we extend the Semantic Sensor Network ontology. We store the metadata of the reports in a KG, while the actual test results are stored in parquet accessible via a Virtual Knowledge Graph. The validation process is managed using an LLM-based approach. We also conduct a benchmarking study to evaluate the performance of state-of-the-art LLMs in executing this task. Finally, we analyze the costs and benefits of automating preexisting processes of manual data extraction and validation for subsequent cross-report analyses.

What problem does this paper attempt to address?

The paper aims to address the challenging issue of test data extraction and validation in the aerospace manufacturing industry. Specifically, it focuses on how to effectively extract and validate test data from relevant test reports of satellite electronic boards (particularly printed circuit boards, PCBs). ### Main Problems Addressed by the Paper 1. **Challenges in Data Extraction**: Due to the highly fragmented, heterogeneous, and unstructured nature of test reports (mainly in .docx and .pdf formats), manually processing these documents to extract test data is very time-consuming and prone to errors. 2. **Difficulties in Data Validation**: The existing process of validating test results primarily relies on manual execution, which is not only costly but also inefficient. The challenge in automating this process lies in the high heterogeneity of the data, making traditional regular expression-based validation methods inadequate to handle such complexity. ### Proposed Method The paper proposes a hybrid approach that combines Large Language Models (LLMs) and Knowledge Graphs (KGs) technologies to address the above issues: 1. **Utilizing Knowledge Graphs (KGs)**: Create a knowledge graph extended from the Semantic Sensor Network (SSN) ontology to capture the semantics of the data and manage structural heterogeneity. This allows the metadata of the test reports to be stored in the knowledge graph, while the actual test results are stored in structured data storage. 2. **Using Large Language Models (LLMs) for Validation**: Leverage the powerful capabilities of LLMs to automatically validate the consistency of test data, effectively handling the syntactic and structural heterogeneity of the data. This way, data engineers can focus on the data points flagged as anomalies by the LLMs, significantly reducing their workload. 3. **Virtual Knowledge Graph (VKG) for Data Access**: Construct a virtual knowledge graph to map data storage to the ontology, allowing users to directly access validated test data through SPARQL queries, further simplifying the data integration and analysis process. Through this approach, the paper aims to increase the automation of test data extraction and validation, thereby reducing human errors, speeding up data analysis, and ultimately improving production efficiency and product quality.

Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data

Midazolam for caudal analgesia in children: comparison with caudal bupivacaine

KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction

Combining Knowledge Graphs and Large Language Models

Empowering Large Language Models in Hybrid Intelligence Systems through Data-Centric Process Models

Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes

Enhancing Knowledge Graph Consistency through Open Large Language Models: A Case Study

Joint Knowledge Graph and Large Language Model for Fault Diagnosis and Its Application in Aviation Assembly

Knowledge-based Consistency Testing of Large Language Models

Are We Testing or Being Tested? Exploring the Practical Applications of Large Language Models in Software Testing

Large language models as oracles for instantiating ontologies with domain-specific knowledge

Fine-tuning Large Enterprise Language Models via Ontological Reasoning

Leveraging LLM for Automated Ontology Extraction and Knowledge Graph Generation

Assessing SPARQL capabilities of Large Language Models

From human experts to machines: An LLM supported approach to ontology and knowledge graph construction

Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems

Enhancing Supply Chain Visibility with Knowledge Graphs and Large Language Models

Semantic Integration of Bosch Manufacturing Data Using Virtual Knowledge Graphs

Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets