Abstract:<h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Objective</h3><p>Medical knowledge graph (KG) is attracting attention from both academic and healthcare industry due to its power in intelligent healthcare applications. In this paper, we introduce a systematic approach to build medical KG from electronic medical records (EMRs) with evaluation by both technical experiments and end to end application examples.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Materials and Methods</h3><p>The original data set contains 16,217,270 de-identified clinical visit data of 3,767,198 patients. The KG construction procedure includes 8 steps, which are data preparation, entity recognition, entity normalization, relation extraction, property calculation, graph cleaning, related-entity ranking, and graph embedding respectively. We propose a novel quadruplet structure to represent medical knowledge instead of the classical triplet in KG. A novel related-entity ranking function considering probability, specificity and reliability (PSR) is proposed. Besides, probabilistic translation on hyperplanes (PrTransH) algorithm is used to learn graph embedding for the generated KG.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Results</h3><p>A medical KG with 9 entity types including disease, symptom, etc. was established, which contains 22,508 entities and 579,094 quadruplets. Compared with term frequency - inverse document frequency (TF/IDF) method, the normalized discounted cumulative gain () increased from 0.799 to 0.906 with the proposed ranking function. The embedding representation for all entities and relations were learned, which are proven to be effective using disease clustering.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Conclusion</h3><p>The established systematic procedure can efficiently construct a high-quality medical KG from large-scale EMRs. The proposed ranking function PSR achieves the best performance under all relations, and the disease clustering result validates the efficacy of the learned embedding vector as entity's semantic representation. Moreover, the obtained KG finds many successful applications due to its statistics-based quadruplet.</p><p>where <span class="math"><math>Ncomin</math></span> is a minimum co-occurrence number and <em>R</em> is the basic reliability value. The reliability value can measure how reliable is the relationship between <em>S<sub>i</sub></em> and <em>O<sub>ij</sub></em>. The reason for the definition is the higher value of <em>N</em><sub>co</sub>(<em>S<sub>i,</sub> O<sub>ij</sub></em>), the relationship is more reliable. However, the reliability values of the two relationships should not have a big difference if both of their co-occurrence numbers are very big. In our study, we finally set <span class="math"><math>Ncomin</math></span> = 10 and <em>R</em> = 1 after some experiments. For instance, if co-occurrence numbers of three relationships are 1, 100 and 10000, their reliability values are 1, 2.96 and 5 respectively.</p>

PDD Graph: Bridging Electronic Medical Records and Biomedical Knowledge Graphs via Entity Linking

DEKGB - An Extensible Framework for Health Knowledge Graph.

PatientEG Dataset: Bringing Event Graph Model with Temporal Relations to Electronic Medical Records

Predicting Rich Drug-Drug Interactions via Biomedical Knowledge Graphs and Text Jointly Embedding

Real-world data medical knowledge graph: construction and applications

A computable biomedical knowledge system: Toward rapidly building candidate‐directed acyclic graphs

Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services

Research of an Extensible Framework for Health Knowledge Graph

Heterogeneous graph construction and HinSAGE learning from electronic medical records

Rare disease knowledge enrichment through a data-driven approach

Construction of a knowledge graph for breast cancer diagnosis based on Chinese electronic medical records: development and usability study

Multi-Source Graph Synthesis (MUGS) for Pediatric Knowledge Graphs from Electronic Health Records

Electronic Health Record-Oriented Knowledge Graph System for Collaborative Clinical Decision Support Using Multicenter Fragmented Medical Data: Design and Application Study

$\mathtt{MedGraph:}$ Structural and Temporal Representation Learning of Electronic Medical Records

SMR: Medical Knowledge Graph Embedding for Safe Medicine Recommendation

BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation

Unifying Diagnosis Identification and Prediction Method Embedding the Disease Ontology Structure From Electronic Medical Records

Accuracy of point-of-care ultrasound for identifying fractures in patients with orthopaedic trauma presenting to emergency department of the All India Institute of Medical Sciences, level 1 trauma centre

Enhancing ophthalmology medical record management with multi-modal knowledge graphs

RDBridge: a knowledge graph of rare diseases based on large-scale text mining

Early detection of Parkinson’s disease through enriching the electronic health record using a biomedical knowledge graph