Knowledge Graph-based Thought: a knowledge graph enhanced LLMs framework for pan-cancer question answering

Yichun Feng,Lu Zhou,Yikai Zheng,Ruikun He,Chao Ma,Yixue Li
DOI: https://doi.org/10.1101/2024.04.17.589873
2024-12-21
Abstract:Background. In recent years, Large Language Models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses. Results. We developed the Knowledge Graph-based Thought (KGT) framework, an innovative solution that integrates LLMs with Knowledge Graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations, and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the Knowledge Graph Question Answering task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named the Pan-cancer Question Answering (PcQA). Conclusions. The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field, demonstrating its exceptional performance in biomedical question answering.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the factual hallucination problem faced by large - language models (LLMs) in biomedical applications. Specifically, LLMs may generate incorrect statements when producing answers, especially in cases where there is a lack of sufficient background information or when they are unable to accurately capture and access factual knowledge. These problems limit the practical applications of LLMs in real - world scenarios that require a high degree of accuracy. To address these challenges, the authors developed a framework named Knowledge Graph - based Thought (KGT). KGT significantly reduces factual errors in reasoning by combining LLMs with Knowledge Graphs (KGs) and using the verifiable information in KGs to improve the initial responses of LLMs. The KGT framework demonstrates strong adaptability and performs well on various open - source LLMs. In addition, KGT can also predict drug resistance by analyzing relevant biomarkers and genetic mechanisms, and discover new uses for existing drugs through potential drug - cancer associations. The following are the main contributions of the KGT framework: 1. **Improved accuracy**: By integrating KGs and LLMs, KGT significantly improves the accuracy and practicality of LLMs in the biomedical field. 2. **Flexibility and generality**: KGT is a flexible architecture that can be seamlessly integrated with multiple LLMs and is easy to deploy. 3. **The first benchmark test**: Using the pan - cancer knowledge graph, the authors proposed the first knowledge - graph question - answering benchmark in the biomedical field (Pan - cancer Question Answering, PcQA) to evaluate KGQA tasks. In summary, this paper aims to overcome the factual hallucination problem of LLMs in biomedical applications by combining LLMs and KGs, and to provide a more reliable and accurate solution for biomedical question - answering.