Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs

Daniel Steinigen,Roman Teucher,Timm Heine Ruland,Max Rudat,Nicolas Flores-Herr,Peter Fischer,Nikola Milosevic,Christopher Schymura,Angelo Ziletti

2024-08-06

Abstract:Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (KGs), thereby aiming to enhance factual correctness using a KG-based retrieval approach. We focus on a medical KG to demonstrate our methodology, which includes (1) pre-processing, (2) Cypher query generation, (3) Cypher query processing, (4) KG retrieval, and (5) LLM-enhanced response generation. We evaluate our system on a curated dataset of 69 samples, achieving a precision of 78\% in retrieving correct KG nodes. Our findings indicate that the hybrid system surpasses a standalone LLM in accuracy and completeness, as verified by an LLM-as-a-Judge evaluation method. This positions the system as a promising tool for applications that demand factual correctness and completeness, such as target identification -- a critical process in pinpointing biological entities for disease treatment or crop enhancement. Moreover, its intuitive search interface and ability to provide accurate responses within seconds make it well-suited for time-sensitive, precision-focused research contexts. We publish the source code together with the dataset and the prompt templates used.

Computation and Language,Information Retrieval

What problem does this paper attempt to address?

The paper aims to address the limitations of large language models (LLMs) in answering domain-specific questions, particularly the issue of factual accuracy due to the lack of domain-specific knowledge. To tackle this problem, the research team proposed a hybrid system named FactFinder, which combines large language models with domain-specific knowledge graphs (KG) to enhance the factual correctness and completeness of the model's responses to scientific questions. Specifically, the main contributions of the FactFinder system include: 1. Providing an easy-to-use system that can combine LLMs and knowledge graphs to answer scientific questions. 2. Releasing a dataset containing manually annotated text-to-Cypher query pairs, which can be used to validate the performance of the text-to-Cypher query conversion system. 3. Demonstrating a method that shows the current state-of-the-art LLMs can generate satisfactory Cypher queries for the life sciences domain. 4. Making the dataset, source code, and prompt templates publicly available. The workflow of the system includes steps such as preprocessing, Cypher query generation, Cypher query processing, knowledge graph retrieval, and LLM-enhanced response generation. To evaluate the system's performance, the researchers conducted tests on a carefully selected dataset, and the results showed that the hybrid system outperformed LLM-only systems in terms of answer accuracy and completeness. Additionally, the system demonstrated its capability to handle irrelevant or incomplete information, further enhancing its reliability. Overall, FactFinder aims to provide a powerful tool for applications requiring factual accuracy and completeness (such as target identification) and is suitable for time-sensitive, high-precision research environments through its intuitive search interface and fast, accurate response capabilities.

Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity

Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models

Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation

KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques

Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction

Efficient Knowledge Infusion via KG-LLM Alignment

Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling

KGLens: Towards Efficient and Effective Knowledge Probing of Large Language Models with Knowledge Graphs

Systematic Assessment of Factual Knowledge in Large Language Models

HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses

Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction

Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

Knowing When to Ask -- Bridging Large Language Models and Data

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering