HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses

Xinke Jiang,Ruizhe Zhang,Yongxin Xu,Rihong Qiu,Yue Fang,Zhiyuan Wang,Jinyi Tang,Hongxin Ding,Xu Chu,Junfeng Zhao,Yasha Wang
DOI: https://doi.org/10.48550/arXiv.2312.15883
2024-04-19
Abstract:In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization. To this end, we develop a Hypothesis Knowledge Graph Enhanced (HyKGE) framework, which leverages LLMs' powerful reasoning capacity to compensate for the incompleteness of user queries, optimizes the interaction process with LLMs, and provides diverse retrieved knowledge. Specifically, HyKGE explores the zero-shot capability and the rich knowledge of LLMs with Hypothesis Outputs to extend feasible exploration directions in the KGs, as well as the carefully curated prompt to enhance the density and efficiency of LLMs' responses. Furthermore, we introduce the HO Fragment Granularity-aware Rerank Module to filter out noise while ensuring the balance between diversity and relevance in retrieved knowledge. Experiments on two Chinese medical multiple-choice question datasets and one Chinese open-domain medical Q&A dataset with two LLM turbos demonstrate the superiority of HyKGE in terms of accuracy and explainability.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issues encountered by large language models (LLMs) in medical question answering, specifically including challenges such as insufficient accuracy, limited interpretability, and inadequate ability to handle domain-specific or highly specialized queries. To tackle these problems, the authors propose a framework named HyKGE (Hypothesis Knowledge Graph Enhanced). ### Main Issues: 1. **Accuracy and Interpretability**: Although large language models have made significant progress in natural language understanding and generation, their application in the medical field still faces issues of factual errors (i.e., hallucinations) and lack of interpretability. 2. **Data Constraints**: Including token resource limitations, high training costs, and privacy issues. 3. **Outdated Knowledge**: The knowledge base of large language models may not be up-to-date. 4. **Lack of Domain Expertise**: For highly specialized queries in specific domains, the performance of LLMs is not ideal. ### Solutions: - **Pre-retrieval Stage**: Utilize the zero-shot capability and rich knowledge of LLMs to compensate for the incompleteness of user queries and explore feasible retrieval directions through the Hypothesis Output Module (HOM). - **Post-retrieval Stage**: Propose a Hypothesis Output Fragment Granularity-aware Rerank Module to balance the relevance and diversity of retrieved knowledge, avoiding information redundancy and noise. - **Experimental Validation**: Validate the superiority of HyKGE in terms of accuracy and interpretability through experiments on two Chinese medical multiple-choice question datasets and one Chinese open-domain medical question answering dataset.