MedMobile: A mobile-sized language model with expert-level clinical capabilities

Krithik Vishwanath,Jaden Stryker,Anton Alaykin,Daniel Alexander Alber,Eric Karl Oermann
2024-10-12
Abstract:Language models (LMs) have demonstrated expert-level reasoning and recall abilities in medicine. However, computational costs and privacy concerns are mounting barriers to wide-scale implementation. We introduce a parsimonious adaptation of phi-3-mini, MedMobile, a 3.8 billion parameter LM capable of running on a mobile device, for medical applications. We demonstrate that MedMobile scores 75.7% on the MedQA (USMLE), surpassing the passing mark for physicians (~60%), and approaching the scores of models 100 times its size. We subsequently perform a careful set of ablations, and demonstrate that chain of thought, ensembling, and fine-tuning lead to the greatest performance gains, while unexpectedly retrieval augmented generation fails to demonstrate significant improvements
Computation and Language
What problem does this paper attempt to address?
The main problems this paper attempts to address are: 1. **Computational Cost and Privacy Issues**: Although large language models (LLMs) have demonstrated expert-level reasoning and memory capabilities in the medical field, their high computational cost and privacy issues are major obstacles to large-scale implementation. 2. **Balancing Model Size and Performance**: Existing high-performance large models are often closed-source, limiting adaptability to specific domains. Therefore, there is a need to develop a small language model that can run on mobile devices and has expert-level clinical capabilities. To this end, the researchers introduced **MedMobile**, a language model based on phi-3-mini with 380 million parameters, capable of running on mobile devices and possessing strong medical application capabilities. MedMobile achieved an accuracy of 75.7% on the MedQA (USMLE) test, surpassing the passing line for doctors (about 60%) and approaching the performance of models 100 times larger. Through a series of experiments and optimization methods, such as Chain of Thought (CoT), Ensemble, and Supervised Fine-tuning, MedMobile performed excellently in multiple medical tasks, especially on USMLE-style questions. These optimization methods significantly enhanced the model's performance, enabling it to play an important role even in resource-constrained environments. In summary, this paper aims to address the computational cost and privacy issues of current large language models in medical applications by developing an efficient, low-resource-demanding mobile device language model with expert-level clinical capabilities.