Towards Foundation Models for Critical Care Time Series

Manuel Burger,Fedor Sergeev,Malte Londschien,Daphné Chopard,Hugo Yèche,Eike Gerdes,Polina Leshetkina,Alexander Morgenroth,Zeynep Babür,Jasmina Bogojeska,Martin Faltys,Rita Kuznetsova,Gunnar Rätsch
2024-11-25
Abstract:Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.
Machine Learning
What problem does this paper attempt to address?
The key problem that this paper attempts to solve is the challenges faced in training large - scale multivariate time - series models on intensive - care time - series data (such as vital signs, laboratory results, and treatment measures). Specifically, the authors focus on the following issues: 1. **Insufficient dataset size and diversity**: Existing intensive - care time - series datasets are relatively small and mainly come from a single medical center. This limits the generalization ability and robustness of the model. By integrating multiple datasets, the diversity and number of patient samples can be significantly increased. 2. **Distribution shift problem**: There are significant differences in recording formats and treatment policies between different hospitals and countries, resulting in poor performance of the model on cross - hospital or cross - country data. Solving these distribution shift problems is the key to building a robust base model. 3. **Lack of a unified benchmark test**: Most previous studies have focused on data from a single center and lack a comprehensive evaluation of multi - center data. Therefore, it is necessary to establish a comprehensive benchmark test framework to evaluate the performance of different machine - learning models on intensive - care time - series data. To solve these problems, the authors propose a work aimed at: - Creating a large, multi - center intensive - care time - series dataset that covers a wide range of clinical features and standardizes core treatment variables. - Establishing a comprehensive benchmark test framework to evaluate the performance of various machine - learning models on the new dataset, especially in the case of distribution shift across hospitals and countries. Through these efforts, the authors hope to lay the foundation for future base - model research and promote the application of deep learning in the field of intensive care, especially for few - shot learning and fine - tuning tasks for small - scale specific patient groups.