SAC-KG: Exploiting Large Language Models as Skilled Automatic Constructors for Domain Knowledge Graphs

Hanzhu Chen,Xu Shen,Qitan Lv,Jie Wang,Xiaoqi Ni,Jieping Ye
2024-09-22
Abstract:Knowledge graphs (KGs) play a pivotal role in knowledge-intensive tasks across specialized domains, where the acquisition of precise and dependable knowledge is crucial. However, existing KG construction methods heavily rely on human intervention to attain qualified KGs, which severely hinders the practical applicability in real-world scenarios. To address this challenge, we propose a general KG construction framework, named SAC-KG, to exploit large language models (LLMs) as Skilled Automatic Constructors for domain Knowledge Graph. SAC-KG effectively involves LLMs as domain experts to generate specialized and precise multi-level KGs. Specifically, SAC-KG consists of three components: Generator, Verifier, and Pruner. For a given entity, Generator produces its relations and tails from raw domain corpora, to construct a specialized single-level KG. Verifier and Pruner then work together to ensure precision by correcting generation errors and determining whether newly produced tails require further iteration for the next-level <a class="link-external link-http" href="http://KG.Experiments" rel="external noopener nofollow">this http URL</a> demonstrate that SAC-KG automatically constructs a domain KG at the scale of over one million nodes and achieves a precision of 89.32%, leading to a superior performance with over 20% increase in precision rate compared to existing state-of-the-art methods for the KG construction task.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing knowledge graph (KG) construction methods rely heavily on human intervention, which greatly limits their applicability in practical application scenarios. Specifically: 1. **High demand for human intervention**: Current knowledge graph construction methods require a large amount of human participation to ensure the quality of the generated knowledge graph. This not only consumes time and resources but is also difficult to apply on a large scale. 2. **Challenges of automation and precision**: Although existing methods based on large - language models (LLMs) can automate the construction of knowledge graphs to a certain extent, they face two main problems: - **Contextual noise**: When directly extracting triples from the original text, a large amount of domain - irrelevant information is included, which may interfere with the performance of the LLM. - **Knowledge hallucination**: The LLM may generate inaccurate or meaningless content, and these errors will further spread to the next level of the knowledge graph, affecting the overall reliability. To solve these problems, the author proposes a general - purpose framework named SAC - KG, which aims to use large - language models as domain experts to automatically and accurately construct domain knowledge graphs. SAC - KG effectively solves the above problems through three core components - Generator, Verifier, and Pruner - and significantly improves the construction quality and efficiency of knowledge graphs. ### Specific solutions - **Generator**: It is responsible for retrieving the most relevant context from the original domain corpus and open knowledge graphs and generating a single - level knowledge graph, reducing the interference of domain - irrelevant information. - **Verifier**: It detects and corrects errors in the generated triples through the rule base to ensure the accuracy of the current - level knowledge graph. - **Pruner**: It decides whether the generated tail entities need to be generated at the next level, improving the controllability and precision of multi - level knowledge graph construction. The experimental results show that SAC - KG can automatically construct a knowledge graph with a scale of more than 1 million nodes and achieve a precision of 89.32%, which is more than 20% higher than the precision index of the existing state - of - the - art methods. In summary, the main goal of this paper is to achieve the automation, specialization, and high - precision of knowledge graph construction by proposing the SAC - KG framework, thereby overcoming the dependence on human intervention and inefficiency of existing methods.