Abstract:Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at <a class="link-external link-https" href="https://github.com/David-Li0406/DALK" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address the challenges of integrating domain-specific knowledge in Alzheimer's Disease (AD) research using large language models (LLMs). Despite the excellent performance of LLMs in various tasks, they still face limitations when dealing with long-tail knowledge and domain-specific knowledge. These issues restrict the application of LLMs in vertical domains such as AD research. Specifically, the paper proposes a Dynamic Co-Augmentation framework (DALK), which stands for "Dynamic Co-Augmentation of LLMs and KG," to address the following issues by combining LLMs and knowledge graphs (KG): 1. **Data Quality Issues**: Scientific literature is dense and redundant, and directly using text retrieval methods may lead to the extraction of irrelevant and noisy information. 2. **Efficiency and Scale Issues**: Knowledge in the AD field rapidly updates with scientific progress, but retraining domain-specific LLMs or updating their knowledge requires substantial computational resources. ### Solution The main contributions of the DALK framework include: 1. **Constructing an AD-specific Knowledge Graph**: Extracting structured and accurate knowledge from AD-related scientific literature to build a knowledge graph specific to the AD domain. 2. **Coarse-to-Fine Sampling Method**: Employing a novel self-aware knowledge retrieval method to select appropriate knowledge to enhance the reasoning capabilities of LLMs. 3. **Evaluation and Analysis**: Constructing an AD Question Answering (ADQA) benchmark dataset and demonstrating the effectiveness of DALK in domain-specific applications through extensive experiments. ### Experimental Results Experimental results show that DALK outperforms other biomedical LLMs and retrieval-augmented models (RAG) on the ADQA benchmark. Particularly, the self-aware knowledge retrieval module significantly improves the model's performance in handling long-context questions. ### Main Contributions 1. **Identifying Limitations of Current Methods**: Highlighting the data quality and efficiency issues faced by LLMs in domain-specific areas such as AD. 2. **Proposing the DALK Framework**: Addressing the aforementioned issues through the co-augmentation of LLMs and KG. 3. **Constructing AD-specific KG and QA Benchmark**: Providing high-quality datasets and detailed experimental results to validate the effectiveness of DALK. 4. **In-depth Analysis**: Conducting a comprehensive analysis of the proposed framework, offering valuable insights and guidance for constructing high-quality KGs and accurately sampling knowledge.

DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

Leveraging Social Determinants of Health in Alzheimer's Research Using LLM-Augmented Literature Mining and Knowledge Graphs

Profiling Patient Transcript Using Large Language Model Reasoning Augmentation for Alzheimer's Disease Detection

Leveraging Large Language Models for Identifying Interpretable Linguistic Markers and Enhancing Alzheimer's Disease Diagnostics

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Alzheimer's Disease Knowledge Graph Enhances Knowledge Discovery and Disease Prediction

Alzheimer Disease Knowledge Graph Enhances Knowledge Discovery and Disease Prediction

Alzheimer’s Disease Knowledge Graph Enhances Knowledge Discovery and Disease Prediction

Simulate Scientific Reasoning with Multiple Large Language Models: An Application to Alzheimer's Disease Combinatorial Therapy

Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation

A Framework of Knowledge Graph-Enhanced Large Language Model Based on Question Decomposition and Atomic Retrieval

CogMG: Collaborative Augmentation Between Large Language Model and Knowledge Graph

Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering

M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering

medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction

CDA: A Contrastive Data Augmentation Method for Alzheimer’s Disease Detection

Enhancing Early Detection of Cognitive Decline in the Elderly: A Comparative Study Utilizing Large Language Models in Clinical Notes

KG-EGV: A Framework for Question Answering with Integrated Knowledge Graphs and Large Language Models