Abstract:Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural language, receiving relevant answers from clinical notes. Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands. Model optimization, including weight quantization, improves latency by approximately 48 times. Promising results indicate potential, yet challenges such as model hallucinations and limited diverse medical case evaluations remain. Addressing these gaps is crucial for unlocking the value in clinical notes and advancing AI-driven clinical decision-making.

What problem does this paper attempt to address?

This paper presents a solution to the challenges of dynamic question answering in clinical documents. With the increasing amount of complex and unstructured clinical notes contained in electronic health records (EHRs), it has become extremely difficult for researchers and clinicians to manually search and extract relevant information. To address this, the paper introduces a natural language dialogue interface based on a large-scale language model (LLMs), which allows users to explore the clinical notes through dynamic question answering. The system utilizes the Langchain framework and a powerful Transformer-based language model to construct a chatbot interface, where users can ask questions in natural language and retrieve answers from the relevant parts of the clinical notes. The paper evaluates the capabilities of different semantic embedding models, such as SentenceTransformers, and large language models in encoding queries and documents for optimizing information retrieval. The experiments show that the "Wizard Vicuna" model, with 1.3 billion parameters, achieves the highest accuracy but requires significant computational resources. To improve inference latency and deployability, the paper incorporates weight quantization techniques, reducing the latency by approximately 48 times. Despite the encouraging results, there are still limitations such as model hallucination and the lack of robust evaluation for diverse medical cases. Future work will focus on addressing these gaps to fully leverage the potential of clinical notes and facilitate clinical decision-making through artificial intelligence (AI). The paper also discusses the limitations of traditional information retrieval methods, such as the lack of semantic and contextual understanding, and highlights the advantages of large language models in comprehending complex natural language texts.

Dynamic Q&A of Clinical Documents with Large Language Models

Critical Care Studies Using Large Language Models Based on Electronic Healthcare Records: A Technical Note

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Integrating UMLS Knowledge into Large Language Models for Medical Question Answering

Towards Expert-Level Medical Question Answering with Large Language Models

Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation

Answering real-world clinical questions using large language model based systems

Enhancing Clinical Accuracy of Medical Chatbots with Large Language Models

Large Language Model-Based Evaluation of Medical Question Answering Systems: Algorithm Development and Case Study

Large language models encode clinical knowledge

Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models

Unlocking the Potential of Free Text in Electronic Health Records with Large Language Models (LLM): Enhancing Patient Safety and Consultation Interactions

Large Language Models as Agents in the Clinic

M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering

Large Language Model Prompting Techniques for Advancement in Clinical Medicine

LongHealth: A Question Answering Benchmark with Long Clinical Documents

Large language models in health care: Development, applications, and challenges

[Clinical application of large language models : Does ChatGPT replace medical report formulation? An experience report]

Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

Evaluating large language models as agents in the clinic