Can Large Language Models abstract Medical Coded Language?

Simon A. Lee,Timothy Lindsey

2024-06-07

Abstract:Large Language Models (LLMs) have become a pivotal research area, potentially making beneficial contributions in fields like healthcare where they can streamline automated billing and decision support. However, the frequent use of specialized coded languages like ICD-10, which are regularly updated and deviate from natural language formats, presents potential challenges for LLMs in creating accurate and meaningful latent representations. This raises concerns among healthcare professionals about potential inaccuracies or ``hallucinations" that could result in the direct impact of a patient. Therefore, this study evaluates whether large language models (LLMs) are aware of medical code ontologies and can accurately generate names from these codes. We assess the capabilities and limitations of both general and biomedical-specific generative models, such as GPT, LLaMA-2, and Meditron, focusing on their proficiency with domain-specific terminologies. While the results indicate that LLMs struggle with coded language, we offer insights on how to adapt these models to reason more effectively.

Computation and Language

What problem does this paper attempt to address?

The paper primarily explores the capabilities and limitations of large language models (LLMs) in handling medical coding language. Specifically, the researchers evaluated whether these models can understand medical coding ontologies and accurately generate corresponding names. The models used in the study include general models and biomedical-specific generative models, such as GPT, LLaMA-2, and Meditron. The core questions of the paper are: - Can large language models accurately generate the correct labels from medical codes? - Do these models exhibit "hallucinations" or errors when dealing with standardized medical coding (such as ICD-10)? The study evaluates the models' performance through a series of experiments, including predicting medical chapter names, generating medical code names, and adversarial attack experiments. The results indicate that although some models (like GPT-4) perform relatively well, large language models generally face significant difficulties in handling medical coding, particularly in distinguishing between real and fake codes. Additionally, the study found that the models perform better with common codes than with rare ones. The paper concludes by proposing several potential solutions, including the use of knowledge graphs, enhancing reasoning capabilities, and generating synthetic data, to improve the performance of large language models in handling medical coding.

Can Large Language Models abstract Medical Coded Language?

Large language models are good medical coders, if provided with tools

Large language models in medicine: the potentials and pitfalls

Large language models in medical and healthcare fields: applications, advances, and challenges

Large language models for science and medicine

Large Language Models in the Medical Field: Principles and Applications

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

A Study of Generative Large Language Model for Medical Research and Healthcare

Demystifying Large Language Models for Medicine: A Primer

Large Language Model Prompting Techniques for Advancement in Clinical Medicine

Evaluating large language models in medical applications: a survey

Is larger always better? Evaluating and prompting large language models for non-generative medical tasks

Large language models in healthcare and medical domain: A review

Embracing Large Language Models for Medical Applications: Opportunities and Challenges

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Large Language Models as Agents in the Clinic

Large Language Models for Medicine: A Survey

Bespoke Large Language Models for Digital Triage Assistance in Mental Health Care

LLMD: A Large Language Model for Interpreting Longitudinal Medical Records

Large language models in health care: Development, applications, and challenges