LLMD: A Large Language Model for Interpreting Longitudinal Medical Records

Robert Porter,Adam Diehl,Benjamin Pastel,J. Henry Hinnefeld,Lawson Nerenberg,Pye Maung,Sebastien Kerbrat,Gillian Hanson,Troy Astorino,Stephen J. Tarsa
2024-10-12
Abstract:We introduce LLMD, a large language model designed to analyze a patient's medical history based on their medical records. Along with domain knowledge, LLMD is trained on a large corpus of records collected over time and across facilities, as well as tasks and labels that make nuanced connections among them. This approach is critical to an accurate picture of patient health, and has distinctive advantages over models trained on knowledge alone, unlabeled records, structured EHR data, or records from a single health system. The recipe for LLMD continues pretraining a foundational model on both domain knowledge and the contents of millions of records. These span an average of 10 years of care and as many as 140 care sites per patient. LLMD is then instruction fine-tuned on structuring and abstraction tasks. The former jointly identify and normalize document metadata, provenance information, clinical named-entities, and ontology mappings, while the latter roll these into higher-level representations, such a continuous era of time a patient was on a medication. LLMD is deployed within a layered validation system that includes continual random audits and review by experts, e.g. based on uncertainty, disease-specific rules, or use-case. LLMD exhibits large gains over both more-powerful generalized models and domain-specific models. On medical knowledge benchmarks, LLMD-8B achieves state of the art accuracy on PubMedQA text responses, besting orders-of-magnitude larger models. On production tasks, we show that LLMD significantly outperforms all other models evaluated, and among alternatives, large general purpose LLMs like GPT-4o are more accurate than models emphasizing medical knowledge. We find strong evidence that accuracy on today's medical benchmarks is not the most significant factor when analyzing real-world patient data, an insight with implications for future medical LLMs.'
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use large - language models (LLMs) to analyze patients' medical history records. Specifically, the paper introduces LLMD (a large - language model), which is designed to analyze patients' medical history based on their medical records. LLMD is trained by combining medical knowledge with a large number of records collected over a long time span and from multiple medical institutions, and can depict patients' health conditions more accurately. This has unique advantages compared to training only based on knowledge, unlabeled records, structured data in electronic health record (EHR) aggregators, or records from a single health system. The design goal of LLMD is to support virtual care, care coordination, and the construction of datasets for more than 60 research studies, including data submitted to the FDA. To achieve this goal, LLMD first continues pre - training on the base model, using content that includes domain knowledge and millions of records. These records cover an average of 10 years of treatment processes, and up to 140 care locations per patient. Subsequently, LLMD is fine - tuned by instructions through structured and abstract tasks, which are respectively used to identify and normalize document metadata, source information, clinical named entities and ontology mapping, and integrate this information into higher - level representations, such as the time period during which a patient takes a certain drug. In addition, LLMD is deployed in a multi - layer verification system, including continuous random audits and reviews configured by experts according to uncertainty, specific disease rules, or end - use. This provides feedback to improve LLMD and provides fine - grained data quality control for various needs. In this way, LLMD not only performs well in medical knowledge benchmark tests, but also significantly outperforms other evaluation models in actual production tasks, especially when dealing with real - world patient data. The accuracy of the model depends not only on the richness of its medical knowledge, but also on how it deals with the complexity of records. This validates the LLMD method and has important implications for future medical LLM development.