Abstract:Integrating multi-modal clinical data, such as electronic health records (EHR) and chest X-ray images (CXR), is particularly beneficial for clinical prediction tasks. However, in a temporal setting, multi-modal data are often inherently asynchronous. EHR can be continuously collected but CXR is generally taken with a much longer interval due to its high cost and radiation dose. When clinical prediction is needed, the last available CXR image might have been outdated, leading to suboptimal predictions. To address this challenge, we propose DDL-CXR, a method that dynamically generates an up-to-date latent representation of the individualized CXR images. Our approach leverages latent diffusion models for patient-specific generation strategically conditioned on a previous CXR image and EHR time series, providing information regarding anatomical structures and disease progressions, respectively. In this way, the interaction across modalities could be better captured by the latent CXR generation process, ultimately improving the prediction performance. Experiments using MIMIC datasets show that the proposed model could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.

What problem does this paper attempt to address?

This paper attempts to solve the problem of temporal asynchrony in multimodal clinical data, such as electronic health records (EHR) and chest X - ray images (CXR). Specifically: 1. **Asynchrony of multimodal data**: In the clinical setting, EHR can be continuously collected, while CXR is usually taken at much longer intervals due to its high cost and radiation dose. When clinical prediction is required, the most recently available CXR image may be out - of - date, resulting in poor prediction performance. 2. **Limitations of existing methods**: Existing methods usually adopt the "carry - forward" strategy, that is, using the last - taken CXR image for downstream prediction tasks. This strategy ignores the rapid changes in the patient's condition between the prediction time and the time of the last CXR image acquisition, inevitably leading to sub - optimal prediction performance. To solve these problems, the authors propose DDL - CXR (Diffusion - based Dynamic Latent Chest X - ray Image Generation), a diffusion - model - based method for generating the latest CXR image that is consistent with the patient - specific condition. Through this method, a CXR image reflecting the current condition can be generated at the time of prediction, thereby alleviating the asynchrony problem between EHR and CXR and improving the accuracy of prediction. ### Specific contributions: - **Generate updated individualized CXR images**: DDL - CXR is the first work to attempt to improve clinical multimodal fusion by generating updated individualized CXR images. - **Contrastive learning method**: A contrastive learning method is proposed to train the LDM, enabling the LDM to capture and utilize the disease progression information in EHR. - **Experimental results**: Experiments show that DDL - CXR outperforms existing methods in both multimodal clinical prediction and individualized CXR generation. ### Method overview: The DDL - CXR framework consists of two stages: 1. **LDM stage**: Learn to generate the latest latent CXR representation based on previous CXR images and EHR time series. 2. **Prediction stage**: Fuse the generated latest latent CXR with other historical data for downstream clinical prediction tasks. In this way, DDL - CXR can better capture the cross - modal interactions between EHR and CXR, thereby improving the accuracy of clinical prediction.

Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation

Multimodal Fusion with Cross-attention Transformer for HCC Early Recurrence Prediction from Multi-Phase CT and Clinical Data

DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency

Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records

Multimodal risk prediction with physiological signals, medical images and clinical notes

MDF-Net for abnormality detection by fusing X-rays with clinical data

Early Diagnosis of Chronic Obstructive Pulmonary Disease from Chest X-Rays using Transfer Learning and Fusion Strategies

MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images

FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction

Fusion of medical imaging and electronic health records with attention and multi-head machanisms

RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR

The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It

Improving Medical Predictions by Irregular Multimodal Electronic Health Records Modeling

Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions

Multimodal Fusion of EHR in Structures and Semantics: Integrating Clinical Records and Notes with Hypergraph and LLM

Multi-Scale Feature Fusion using Parallel-Attention Block for COVID-19 Chest X-ray Diagnosis

Diff-CXR: Report-to-CXR generation through a disease-knowledge enhanced diffusion model

MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation

Multi-modal learning for inpatient length of stay prediction

Research on Multimodal Fusion of Temporal Electronic Medical Records

Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data