Justin Cosentino,Anastasiya Belyaeva,Xin Liu,Nicholas A. Furlotte,Zhun Yang,Chace Lee,Erik Schenck,Yojan Patel,Jian Cui,Logan Douglas Schneider,Robby Bryant,Ryan G. Gomes,Allen Jiang,Roy Lee,Yun Liu,Javier Perez,Jameson K. Rogers,Cathy Speed,Shyam Tailor,Megan Walker,Jeffrey Yu,Tim Althoff,Conor Heneghan,John Hernandez,Mark Malhotra,Leor Stern,Yossi Matias,Greg S. Corrado,Shwetak Patel,Shravya Shetty,Jiening Zhan,Shruthi Prabhakara,Daniel McDuff,Cory Y. McLean

Abstract:In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM.

LLM-CGM: A Benchmark for Large Language Model-Enabled Querying of Continuous Glucose Monitoring Data for Conversational Diabetes Management.

Leveraging Large Language Models to Analyze Continuous Glucose Monitoring Data: A Case Study

Let Curves Speak: A Continuous Glucose Monitor based Large Sensor Foundation Model for Diabetes Management

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data

An adapted large language model facilitates multiple medical tasks in diabetes care

Towards a Personal Health Large Language Model

Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients

Graph-Augmented LLMs for Personalized Health Insights: A Case Study in Sleep Analysis

Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges

Large Language Models in Healthcare: A Comprehensive Benchmark

ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models

COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain

Large Language Models for Diabetes Care: Potentials and Prospects.

Large Language Model Benchmarks in Medical Tasks

Multilevel functional distributional models with application to continuous glucose monitoring in diabetes clinical trials

From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models

HGMLA: A Multi-Task Learning Model for Assessment of HbA1c and GA Levels Using Short-Term CGM Sensor Data

PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models

Scalable information extraction from free text electronic health records using large language models

Transforming Wearable Data into Health Insights using Large Language Model Agents

Can Large Language Models Provide Emergency Medical Help Where There Is No Ambulance? A Comparative Study on Large Language Model Understanding of Emergency Medical Scenarios in Resource-Constrained Settings