IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Ghadeer O. Ghosheh,Jin Li,Tingting Zhu
2024-01-09
Abstract:Electronic Health Records present a valuable modality for driving personalized medicine, where treatment is tailored to fit individual-level differences. For this purpose, many data-driven machine learning and statistical models rely on the wealth of longitudinal EHRs to study patients' physiological and treatment effects. However, longitudinal EHRs tend to be sparse and highly missing, where missingness could also be informative and reflect the underlying patient's health status. Therefore, the success of data-driven models for personalized medicine highly depends on how the EHR data is represented from physiological data, treatments, and the missing values in the data. To this end, we propose a novel deep-learning model that learns the underlying patient dynamics over time across multivariate data to generate personalized realistic values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE (Individualized GeNeration of Imputations in Time-series Electronic health records), utilises a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. In IGNITE, we further propose a novel individualized missingness mask (IMM), which helps our model generate values based on the individual's observed data and missingness patterns. We further extend the use of IGNITE from imputing missingness to a personalized data synthesizer, where it generates missing EHRs that were never observed prior or even generates new patients for various applications. We validate our model on three large publicly available datasets and show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper focuses on how to deal with missing data in electronic health records (EHR), which is an important challenge in personalized medicine. Due to the multiple variables, highly missing nature, and irregular sampling of EHR data, effectively handling these data is crucial for developing machine learning models that can adapt to individual differences. The paper proposes a novel deep learning model called IGNITE, which is based on variational autoencoders and conditional dual attention networks, to generate personalized missing values based on patient features, treatment history, and individual missing patterns. The IGNITE model introduces an innovative Individualized Missingness Mask (IMM), which considers the missing frequencies and patterns of different dimensions in time series, helping the model generate more accurate personalized filling values. In addition, the model is not only used for missing value imputation but also can be extended to a personalized data synthesizer to generate unobserved EHR data and even create new patient records. The paper validates the effectiveness of the IGNITE model through experiments on three large publicly available ICU datasets, demonstrating its superiority over existing methods in missing data reconstruction and task prediction. The experimental results show that IGNITE exhibits high robustness and performance under different missing patterns and missing rates. In summary, the paper attempts to address how to utilize deep learning generative models to handle and utilize missing data in electronic health records, in order to promote the development of personalized medicine and improve the accuracy of disease prediction, risk stratification, and treatment strategies.