Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction

Yanjun Gao,Ruizhe Li,Emma Croxford,Samuel Tesch,Daniel To,John Caskey,Brian W Patterson,Matthew M Churpek,Timothy Miller,Dmitriy Dligach,Majid Afshar

DOI: https://doi.org/10.1101/2023.11.24.23298641

2024-06-18

Abstract:While large language models (LLMs) have showcased their potential in diverse language tasks, their application in the healthcare arena needs to ensure the minimization of diagnostic errors and the prevention of patient harm. A medical knowledge graph (KG) houses a wealth of structured medical concept relations sourced from authoritative references, such as UMLS, making it a valuable resource to ground LLM diagnostic process in knowledge. In this paper, we examine the synergistic potential of LLMs and medical KG in predicting diagnoses given electronic health records (EHR), under the framework of Retrieval-augmented generation (RAG). We proposed a novel graph model: DR.KNOWS, that selects the most relevant pathology knowledge paths based on the medical problem descriptions. In order to evaluate DR.KNOWS, we developed the first comprehensive human evaluation approach to assess the performance of LLMs for diagnosis prediction and examine the rationale behind their decision-making processes, aimed at improving diagnostic safety. Using real-world hospital datasets, our study serves to enrich the discourse on the role of medical KGs in grounding medical knowledge into LLMs, revealing both challenges and opportunities in harnessing external knowledge for explainable diagnostic pathway and the realization of AI-augmented diagnostic decision support systems.

Health Informatics

What problem does this paper attempt to address?

The paper primarily aims to address the following issues: ### Core Issues - **Improving Diagnostic Accuracy and Reliability**: When using large language models (LLMs) for automatic disease diagnosis, how to ensure high accuracy and reliability of the diagnostic results to avoid potential life-threatening situations. ### Solution Exploration - **Integrating Medical Knowledge Graphs**: Research on how to integrate medical knowledge graphs (Medical Knowledge Graphs, KGs) into large language models to enhance the model's performance in diagnostic generation tasks and improve its interpretability. - **Developing New Graph Models**: Proposing a new graph model named DR.KNOWS, which is used to identify the most relevant pathological knowledge paths from authoritative sources like the Unified Medical Language System (UMLS). - **Evaluation Framework Design**: Designing the first comprehensive human evaluation framework to assess the performance of LLMs in disease prediction and to deeply explore the rationale behind these models' decision-making processes, with a particular focus on improving diagnostic safety. ### Specific Objectives 1. **Evaluate DR.KNOWS**: Assess the ability of DR.KNOWS to select the most likely diagnoses and their interpretable paths. 2. **Design and Implement a Human Evaluation Framework**: Design and implement the first dedicated human evaluation framework for the diagnostic and reasoning outputs generated by LLMs. 3. **Explore the Role of Knowledge Graphs**: Investigate the integration of knowledge graphs as an additional module into LLMs to enhance the effectiveness of disease-related diagnostic generation. 4. **Validate the Effectiveness of the Human Evaluation Framework**: Demonstrate the practicality of the proposed human evaluation framework in revealing key aspects of LLMs' diagnostic performance and ensuring diagnostic safety. ### Main Contributions - Proposed the DR.KNOWS graph model to select the most relevant knowledge paths from the UMLS knowledge graph for specific cases. - Designed the first dedicated human evaluation framework for LLMs' diagnostic generation and reasoning. - Demonstrated through empirical analysis the impact of knowledge graphs on improving diagnostic abstraction and correct reasoning capabilities. - Validated that the proposed evaluation framework effectively reveals the strengths and weaknesses of LLMs in diagnostic generation, aiding future model improvements. In summary, this paper is dedicated to exploring how to improve the accuracy and reliability of automatic diagnosis by combining large language models and medical knowledge graphs, and has designed an innovative human evaluation framework for this purpose.

Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction

Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction

Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction

Large Language Models for Biomedical Knowledge Graph Construction: Information extraction from EMR notes

On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models

Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models

Integrating Automated Knowledge Extraction with Large Language Models for Explainable Medical Decision-Making

Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

Large language models encode clinical knowledge

MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding

KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques

Evaluating large language models in medical applications: a survey

Evaluating large language model workflows in clinical decision support: referral, triage, and diagnosis

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models

Integrated Application of LLM Model and Knowledge Graph in Medical Text Mining and Knowledge Extraction

Demystifying Large Language Models for Medicine: A Primer