Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation

Wenfang Yao,Chen Liu,Kejing Yin,William K. Cheung,Jing Qin
2024-10-23
Abstract:Integrating multi-modal clinical data, such as electronic health records (EHR) and chest X-ray images (CXR), is particularly beneficial for clinical prediction tasks. However, in a temporal setting, multi-modal data are often inherently asynchronous. EHR can be continuously collected but CXR is generally taken with a much longer interval due to its high cost and radiation dose. When clinical prediction is needed, the last available CXR image might have been outdated, leading to suboptimal predictions. To address this challenge, we propose DDL-CXR, a method that dynamically generates an up-to-date latent representation of the individualized CXR images. Our approach leverages latent diffusion models for patient-specific generation strategically conditioned on a previous CXR image and EHR time series, providing information regarding anatomical structures and disease progressions, respectively. In this way, the interaction across modalities could be better captured by the latent CXR generation process, ultimately improving the prediction performance. Experiments using MIMIC datasets show that the proposed model could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of temporal asynchrony in multimodal clinical data, such as electronic health records (EHR) and chest X - ray images (CXR). Specifically: 1. **Asynchrony of multimodal data**: In the clinical setting, EHR can be continuously collected, while CXR is usually taken at much longer intervals due to its high cost and radiation dose. When clinical prediction is required, the most recently available CXR image may be out - of - date, resulting in poor prediction performance. 2. **Limitations of existing methods**: Existing methods usually adopt the "carry - forward" strategy, that is, using the last - taken CXR image for downstream prediction tasks. This strategy ignores the rapid changes in the patient's condition between the prediction time and the time of the last CXR image acquisition, inevitably leading to sub - optimal prediction performance. To solve these problems, the authors propose DDL - CXR (Diffusion - based Dynamic Latent Chest X - ray Image Generation), a diffusion - model - based method for generating the latest CXR image that is consistent with the patient - specific condition. Through this method, a CXR image reflecting the current condition can be generated at the time of prediction, thereby alleviating the asynchrony problem between EHR and CXR and improving the accuracy of prediction. ### Specific contributions: - **Generate updated individualized CXR images**: DDL - CXR is the first work to attempt to improve clinical multimodal fusion by generating updated individualized CXR images. - **Contrastive learning method**: A contrastive learning method is proposed to train the LDM, enabling the LDM to capture and utilize the disease progression information in EHR. - **Experimental results**: Experiments show that DDL - CXR outperforms existing methods in both multimodal clinical prediction and individualized CXR generation. ### Method overview: The DDL - CXR framework consists of two stages: 1. **LDM stage**: Learn to generate the latest latent CXR representation based on previous CXR images and EHR time series. 2. **Prediction stage**: Fuse the generated latest latent CXR with other historical data for downstream clinical prediction tasks. In this way, DDL - CXR can better capture the cross - modal interactions between EHR and CXR, thereby improving the accuracy of clinical prediction.