Towards a Personal Health Large Language Model

Justin Cosentino,Anastasiya Belyaeva,Xin Liu,Nicholas A. Furlotte,Zhun Yang,Chace Lee,Erik Schenck,Yojan Patel,Jian Cui,Logan Douglas Schneider,Robby Bryant,Ryan G. Gomes,Allen Jiang,Roy Lee,Yun Liu,Javier Perez,Jameson K. Rogers,Cathy Speed,Shyam Tailor,Megan Walker,Jeffrey Yu,Tim Althoff,Conor Heneghan,John Hernandez,Mark Malhotra,Leor Stern,Yossi Matias,Greg S. Corrado,Shwetak Patel,Shravya Shetty,Jiening Zhan,Shruthi Prabhakara,Daniel McDuff,Cory Y. McLean
2024-06-11
Abstract:In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Understanding and Application of Personalized Health Data**: Existing large language models (LLMs) are primarily applied to clinical tasks, while personal health monitoring data provided by mobile and wearable devices (such as sleep, fitness, etc.) have not yet been fully integrated into these models. The paper introduces the Personal Health Large Language Model (PH-LLM), a new version fine-tuned based on the Gemini model, specifically designed to understand and process text and time series data in the personal health domain. 2. **Evaluating Model Performance**: To systematically evaluate the performance of PH-LLM in the personal health domain, the authors created three new benchmark datasets to test the model's ability to generate personalized insights and recommendations, expert domain knowledge, and predict self-reported sleep quality. 3. **Improving the Quality of Personalized Recommendations**: Through detailed case studies, the paper demonstrates the effectiveness of PH-LLM in generating personalized insights and recommendations regarding sleep and fitness, showing that the fine-tuned model approaches or even surpasses human experts' performance in certain tasks. 4. **Multimodal Data Fusion**: The research also explores how to fuse multiple sensor data to improve the understanding and prediction accuracy of personal health status, achieving significant results, particularly in predicting sleep quality. In summary, the goal of this paper is to demonstrate how advanced language models can be utilized to better understand and leverage personal health data, thereby providing more precise health advice and services.