Abstract:Electronic Health Records (EHR) has revolutionized healthcare data management and prediction in the field of AI and machine learning. Accurate predictions of diagnosis and medications significantly mitigate health risks and provide guidance for preventive care. However, EHR driven models often have limited scope on understanding medical-domain knowledge and mostly rely on simple-and-sole ontologies. In addition, due to the missing features and incomplete disease coverage of EHR, most studies only focus on basic analysis on conditions and medication. We propose DualMAR, a framework that enhances EHR prediction tasks through both individual observation data and public knowledge bases. First, we construct a bi-hierarchical Diagnosis Knowledge Graph (KG) using verified public clinical ontologies and augment this KG via Large Language Models (LLMs); Second, we design a new proxy-task learning on lab results in EHR for pretraining, which further enhance KG representation and patient embeddings. By retrieving radial and angular coordinates upon polar space, DualMAR enables accurate predictions based on rich hierarchical and semantic embeddings from KG. Experiments also demonstrate that DualMAR outperforms state-of-the-art models, validating its effectiveness in EHR prediction and KG integration in medical domains.

What problem does this paper attempt to address?

This paper attempts to solve several key problems in the prediction tasks in the medical field using the Electronic Health Record (EHR) - driven model. Specifically: 1. **Limited understanding of medical domain knowledge**: Existing EHR - driven models usually rely on simple single ontologies and are difficult to comprehensively understand the complex knowledge in the medical field. 2. **Missing data and incomplete disease coverage**: Due to the problems of feature missing and incomplete disease coverage in EHR data, most studies are limited to simple analysis of basic conditions and drugs. 3. **Lack of utilization of laboratory test results**: Existing methods usually ignore key information such as laboratory test results, which are crucial for accurate diagnosis and treatment recommendations. To solve these problems, the paper proposes a framework named DualMAR, which enhances EHR prediction tasks by combining individual observation data and public knowledge bases. The main contributions of DualMAR include: - **"Knowledge Scholar" module**: A two - level diagnostic knowledge graph (KG) is constructed and enhanced by a large - language model (LLM). This module uses polar - coordinate - space projection to capture semantic and hierarchical information. - **"Local Expert" module**: A new proxy - task - learning method is designed, which uses laboratory results in EHR for pre - training, thereby enhancing patient - embedding representations. - **Dual - expertise perspective**: An encoder - decoder architecture is adopted. The embeddings from the "Knowledge Scholar" are used as prior knowledge, and these representations are continuously refined by the "Local Expert", ultimately achieving more accurate predictions. ### Formula Summary 1. **KG Fusion Formula**: \[ GH = GM \cup GN, \quad \tilde{GH} = \text{NORMALIZE}(GH) \] where \(GM\) and \(GN\) are knowledge graphs generated based on the existing database and LLM respectively. 2. **Polar - Coordinate - Space Embedding Formula**: \[ h_r \odot r_r = t_r, \quad (h_a + r_a) \mod 2\pi = t_a \] \[ d_r(h_r, t_r) = \|h_r \odot r_r - t_r\|_2, \quad d_a(h_a, t_a) = \|\sin((h_a + r_a - t_a)/2)\|_1 \] \[ d(h, t) = \alpha d_r(h_r, t_r) + \beta d_a(h_a, t_a) \] 3. **Loss Function**: \[ L = -\log \sigma(\gamma - d(h, t)) - \sum_{i = 1}^{n} \log \sigma(d(h', t') - \gamma) \] 4. **Attention Mechanism**: \[ z_i = \tanh(W_c x_i), \quad r_\tau = \tanh(W_v \sigma(W_u v_\tau)) \] \[ \alpha_i^\tau = \frac{\exp(z_i)}{\sum_{j = 1}^n \exp(z_j)}, \quad \beta_\tau = \frac{\exp(r_\tau)}{\sum_{\tau = 1}^T \exp(r_\tau)} \] 5. **Downstream - Task Loss**: \[ L_j = \frac{1}{|Y|} \sum_{i = 1}^{|Y|} \text{BCE}(\hat{y}_i, y_i), \quad Y = \{L_1, L_2, L_3\} \] \[ L_i = \text{BCE}(\hat{y}_i, y_i), \quad i = 1, 2, 3 \] Through these methods, DualMAR can...

DualMAR: Medical-Augmented Representation from Dual-Expertise Perspectives

DeepHealth: Deep Representation Learning with Autoencoders for Healthcare Prediction

REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction

Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction

Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

EHR-KnowGen: Knowledge-enhanced multimodal learning for disease diagnosis generation

MEGACare: Knowledge-guided multi-view hypergraph predictive framework for healthcare

Stage-Aware Hierarchical Attentive Relational Network for Diagnosis Prediction

Joint Medical Ontology Representation Learning for Healthcare Predictions

DKEC: Domain Knowledge Enhanced Multi-Label Classification for Diagnosis Prediction

EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling

Multimodal risk prediction with physiological signals, medical images and clinical notes

Predictive Modeling of Clinical Events with Mutual Enhancement Between Longitudinal Patient Records and Medical Knowledge Graph

Marrying Medical Domain Knowledge With Deep Learning on Electronic Health Records: A Deep Visual Analytics Approach

Hybrid disease prediction approach leveraging digital twin and metaverse technologies for health consumer

Deep Knowledge Reasoning Guided Disease Prediction.