Abstract:We present our results regarding the automatic construction of a knowledge graph from historical documents related to the Chilean dictatorship period (1973-1990). Our approach consists on using LLMs to automatically recognize entities and relations between these entities, and also to perform resolution between these sets of values. In order to prevent hallucination, the interaction with the LLM is grounded in a simple ontology with 4 types of entities and 7 types of relations. To evaluate our architecture, we use a gold standard graph constructed using a small subset of the documents, and compare this to the graph obtained from our approach when processing the same set of documents. Results show that the automatic construction manages to recognize a good portion of all the entities in the gold standard, and that those not recognized are mostly explained by the level of granularity in which the information is structured in the graph, and not because the automatic approach misses an important entity in the graph. Looking forward, we expect this report will encourage work on other similar projects focused on enhancing research in humanities and social science, but we remark that better evaluation metrics are needed in order to accurately fine-tune these types of architectures.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to automatically construct a knowledge graph from a large number of historical documents related to the Chilean dictatorship period (1973 - 1990). Specifically, the researchers hope to use large - language models (LLMs) to automatically identify entities and the relationships between them, and reduce the hallucination phenomenon of the model through the guidance of a simple ontology. In addition, they also hope that through this method, they can better integrate and analyze the information in these historical documents, thereby supporting the research on this important historical event. ### Research Background Knowledge graphs have great potential in analyzing historical documents. By constructing a knowledge graph, the focus can be shifted from document - centered to entity - centered, enabling users to more conveniently find relevant entities and their associated information. However, constructing a knowledge graph is a time - consuming and costly task, which requires reading and organizing all relevant documents to ensure the accuracy of entities and relationships. ### Main Challenges 1. **Entity Recognition**: Accurately identify all relevant entities from a large number of historical documents. 2. **Relationship Extraction**: Determine the relationships between these entities. 3. **Avoid Hallucination**: Prevent LLMs from generating inaccurate or non - existent information. 4. **Evaluate Quality**: Ensure that the quality of the automatically generated knowledge graph meets the standards. ### Solutions To address the above challenges, the researchers proposed a method for automatically constructing a knowledge graph based on LLMs. The specific steps are as follows: 1. **Use Simple Ontology**: Define four types of entities (individuals, events, locations, organizations) and seven types of relationships (such as the relationship between individuals and organizations, the relationship between organizations and events, etc.) to guide LLMs in entity and relationship extraction. 2. **Zero - sample Prompting**: Send specific prompts to LLMs through OpenAI's API to enable them to identify entities and relationships in the documents. 3. **Entity Resolution**: Remove duplicate entities and correct possible errors. 4. **Graph Post - processing**: Remove redundant and incorrect edges (relationships), and further optimize the graph structure by merging redundant nodes. ### Evaluation Methods To verify the effectiveness of this method, the researchers used a standard graph constructed by domain experts as a benchmark and compared the differences between the automatically generated graph and the standard graph. The results show that in terms of individual recognition, this method performs excellently; while in the recognition of organizations, events, and locations, although there are some deviations, it can still capture the main information overall. ### Future Work The researchers plan to further improve the prompting strategy to improve the recognition accuracy of different types of entities and relationships. In addition, they will also create a labeled corpus to systematically evaluate the performance of tools such as chatGPT in entity recognition and explore other possible methods, such as named - entity recognition algorithms. In summary, this research aims to provide strong support for the historical research of the Chilean dictatorship period through automated means, fill in information gaps, enhance context understanding, and reveal more in - depth connections and patterns.

Automatic knowledge-graph creation from historical documents: The Chilean dictatorship as a case study

Can LLMs be Good Graph Judger for Knowledge Graph Construction?

Generative Knowledge Graph Construction: A Review

GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework

Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish

Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective

Generating knowledge graphs by employing Natural Language Processing and Machine Learning techniques within the scholarly domain

LLM4EduKG: LLM for Automatic Construction of Educational Knowledge Graph

Schema-adaptable Knowledge Graph Construction

AutoKG: Constructing Virtual Knowledge Graphs from Unstructured Documents for Question Answering

Automatic Knowledge Graph Construction for Judicial Cases

AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data

DETECTION OF POLIOVIRUS TYPE 1 IN HUMAN CEREBROSPINAL FLUID.

KG-RAG: Bridging the Gap Between Knowledge and Creativity

Information for Conversation Generation: Proposals Utilising Knowledge Graphs

Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering

Constructing Knowledge Graphs for Online Collaborative Programming

Getting Quechua Closer to Final Users through Knowledge Graphs

Cross-Data Knowledge Graph Construction for LLM-enabled Educational Question-Answering System: A~Case~Study~at~HCMUT

Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education

SAC-KG: Exploiting Large Language Models as Skilled Automatic Constructors for Domain Knowledge Graphs