Large Language Models for Biomedical Knowledge Graph Construction: Information extraction from EMR notes

Vahan Arsenyan,Spartak Bughdaryan,Fadi Shaya,Kent Small,Davit Shahnazaryan

2023-12-10

Abstract:The automatic construction of knowledge graphs (KGs) is an important research area in medicine, with far-reaching applications spanning drug discovery and clinical trial design. These applications hinge on the accurate identification of interactions among medical and biological entities. In this study, we propose an end-to-end machine learning solution based on large language models (LLMs) that utilize electronic medical record notes to construct KGs. The entities used in the KG construction process are diseases, factors, treatments, as well as manifestations that coexist with the patient while experiencing the disease. Given the critical need for high-quality performance in medical applications, we embark on a comprehensive assessment of 12 LLMs of various architectures, evaluating their performance and safety attributes. To gauge the quantitative efficacy of our approach by assessing both precision and recall, we manually annotate a dataset provided by the Macula and Retina Institute. We also assess the qualitative performance of LLMs, such as the ability to generate structured outputs or the tendency to hallucinate. The results illustrate that in contrast to encoder-only and encoder-decoder, decoder-only LLMs require further investigation. Additionally, we provide guided prompt design to utilize such LLMs. The application of the proposed methodology is demonstrated on age-related macular degeneration.

Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper aims to address the problem of automatically constructing medical knowledge graphs from Electronic Medical Records (EMR) notes. Specifically, the study proposes an end-to-end machine learning solution based on large language models (LLMs) to identify relationships between diseases, factors, treatments, and manifestations coexisting with patients during their disease experience. The main contributions of the paper include: 1. **Proposed an end-to-end approach**: Utilizing large language models to automatically construct knowledge graphs from EMR notes. 2. **Conducted extensive and in-depth evaluations**: Performed comprehensive performance and safety evaluations on 12 different architectures of large language models, with a particular focus on optimizing for clinical relationship extraction tasks. 3. **Provided guiding prompt designs**: Introduced guiding prompt designs to leverage decoder-only LLMs for relationship extraction to construct knowledge graphs between medical entities. The study also found that, compared to encoder-only and encoder-decoder models, decoder-only models need further improvement in outputting structured information. By using guiding prompt designs, some decoder-only models can be adapted for this task. Additionally, the paper details an application case on age-related macular degeneration (AMD), demonstrating the effectiveness of the proposed method.

Large Language Models for Biomedical Knowledge Graph Construction: Information extraction from EMR notes

Large Language Models and Medical Knowledge Grounding for Diagnosis Prediction

Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Large Language Models and Knowledge Graphs: Opportunities and Challenges

A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences

MedG-KRP: Medical Graph Knowledge Representation Probing

Embracing Large Language Models for Medical Applications: Opportunities and Challenges

SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research

Large Language Model Prompting Techniques for Advancement in Clinical Medicine

Large language models encode clinical knowledge

Large Language Models in Ophthalmology: Potential and Pitfalls

medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

Augmented non-hallucinating large language models as medical information curators

Benchmarking Biomedical Relation Knowledge in Large Language Models

Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search

Large Language Models in Medicine: The Potentials and Pitfalls

Large Language Models for Medicine: A Survey

Large language models for science and medicine