GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models

Mohammed-Khalil Ghali,Abdelrahman Farrag,Hajar Sakai,Hicham El Baz,Yu Jin,Sarah Lam
2024-05-31
Abstract:In the rapidly evolving field of healthcare and beyond, the integration of generative AI in Electronic Health Records (EHRs) represents a pivotal advancement, addressing a critical gap in current information extraction techniques. This paper introduces GAMedX, a Named Entity Recognition (NER) approach utilizing Large Language Models (LLMs) to efficiently extract entities from medical narratives and unstructured text generated throughout various phases of the patient hospital visit. By addressing the significant challenge of processing unstructured medical text, GAMedX leverages the capabilities of generative AI and LLMs for improved data extraction. Employing a unified approach, the methodology integrates open-source LLMs for NER, utilizing chained prompts and Pydantic schemas for structured output to navigate the complexities of specialized medical jargon. The findings reveal significant ROUGE F1 score on one of the evaluation datasets with an accuracy of 98\%. This innovation enhances entity extraction, offering a scalable, cost-effective solution for automated forms filling from unstructured data. As a result, GAMedX streamlines the processing of unstructured narratives, and sets a new standard in NER applications, contributing significantly to theoretical and practical advancements beyond the medical technology sphere.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of efficiently extracting entity information from unstructured Electronic Health Records (EHRs) in the medical field. Specifically, the paper proposes a method called GAMedX, which leverages Large Language Models (LLMs) to identify and extract key entity information from medical narratives and unstructured texts. By tackling the significant challenges of processing unstructured medical texts, GAMedX aims to improve the accuracy and efficiency of data extraction and provide a scalable and cost-effective automated form-filling solution. Additionally, the method emphasizes reliability, consistency, and seamless workflow integration in practical applications, thereby optimizing hospital resources and services and improving patient health outcomes. In short, the core objective of the paper is to develop a system capable of effectively processing and extracting unstructured data from Electronic Health Records to enhance medical information extraction technology and simplify the clinical documentation process.