Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

Emre Can Acikgoz,Osman Batur İnce,Rayene Bench,Arda Anıl Boz,İlker Kesen,Aykut Erdem,Erkut Erdem

2024-04-25

Abstract:The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for advancing the field, fostering reproducibility, and encouraging innovation in healthcare AI. We present Hippocrates, an open-source LLM framework specifically developed for the medical domain. In stark contrast to previous efforts, it offers unrestricted access to its training datasets, codebase, checkpoints, and evaluation protocols. This open approach is designed to stimulate collaborative research, allowing the community to build upon, refine, and rigorously evaluate medical LLMs within a transparent ecosystem. Also, we introduce Hippo, a family of 7B models tailored for the medical domain, fine-tuned from Mistral and LLaMA2 through continual pre-training, instruction tuning, and reinforcement learning from human and AI feedback. Our models outperform existing open medical LLMs models by a large-margin, even surpassing models with 70B parameters. Through Hippocrates, we aspire to unlock the full potential of LLMs not just to advance medical knowledge and patient care but also to democratize the benefits of AI research in healthcare, making them available across the globe.

Machine Learning,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the application barriers of large - language models (LLMs) in the medical field. Although LLMs perform excellently in various natural - language processing tasks, the main challenges they face in the clinical environment include the lack of domain - specific knowledge and complex medical terminology. These problems limit the potential of LLMs in medical diagnosis, research, and patient care. To overcome these barriers, the paper introduces **Hippocrates**, an open - source large - language - model framework specifically developed for the medical field. Hippocrates aims to promote the development of medical LLMs in the following ways: 1. **Open access**: Provide unrestricted access to its training data sets, code repositories, checkpoints, and evaluation protocols, promote transparency and reproducibility, and encourage innovation in the medical AI field. 2. **Model improvement**: Fine - tune existing models (such as Mistral and LLaMA2) through continuous pre - training, instruction tuning, and reinforcement learning from human and AI feedback to generate models more suitable for the medical field. 3. **Performance enhancement**: The developed Hippo series 7B - parameter model significantly outperforms existing open medical LLMs in multiple medical benchmark tests and even surpasses the 70B - parameter model. Through these measures, Hippocrates not only aims to improve the professional capabilities of medical LLMs but also is committed to popularizing the benefits of AI research globally, thereby promoting the progress of medical knowledge and the improvement of patient care.

Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

Embracing Large Language Models for Medical Applications: Opportunities and Challenges

A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

Large Language Models in Medicine: The Potentials and Pitfalls

Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care

The future landscape of large language models in medicine

Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding

EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Demystifying Large Language Models for Medicine: A Primer

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Large Language Models in Healthcare: A Comprehensive Benchmark

Large Language Models in Ophthalmology: Potential and Pitfalls

Distilling Large Language Models for Matching Patients to Clinical Trials

Large language models in healthcare and medical domain: A review

Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding

Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review

Revolutionizing Healthcare: the Transformative Impact of LLMs in Medicine (Preprint)

Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark