Abstract:The broad adoption of electronic health record (EHR) systems brings us a tremendous amount of clinical data and thus provides opportunities to conduct data-based healthcare research to solve various clinical problems in the medical domain. Machine learning and deep learning methods are widely used in the medical informatics and healthcare domain due to their power to mine insights from raw data. When adapting deep learning models for EHR data, it is essential to consider its heterogeneous nature: EHR contains patient records from various sources including medical tests (e.g. blood test, microbiology test), medical imaging, diagnosis, medications, procedures, clinical notes, etc. Those modalities together provide a holistic view of patient health status and complement each other. Therefore, combining data from multiple modalities that are intrinsically different is challenging but intuitively promising in deep learning for EHR. To assess the expectations of multimodal data, we introduce a comprehensive fusion framework designed to integrate temporal variables, medical images, and clinical notes in EHR for enhanced performance in clinical risk prediction. Early, joint, and late fusion strategies are employed to combine data from various modalities effectively. We test the model with three predictive tasks: in-hospital mortality, long length of stay, and 30-day readmission. Experimental results show that multimodal models outperform uni-modal models in the tasks involved. Additionally, by training models with different input modality combinations, we calculate the Shapley value for each modality to quantify their contribution to multimodal performance. It is shown that temporal variables tend to be more helpful than CXR images and clinical notes in the three explored predictive tasks.

That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data

Multimodal Data Matters: Language Model Pre-Training Over Structured and Unstructured Electronic Health Records

Multimodal Foundation Models Exploit Text to Make Medical Image Predictions

Two heads are better than one: Enhancing medical representations by pre-training over structured and unstructured electronic health records

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training

Contrastive Learning on Multimodal Analysis of Electronic Health Records

Multimodal Foundation Models For Echocardiogram Interpretation

SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human Annotated Dataset and Fine-Tuned Instruction Generation

Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology

Radiology Reports Improve Visual Representations Learned from Radiographs

Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects

Medical Multimodal Classifiers Under Scarce Data Condition

Latent Space Explorer: Visual Analytics for Multimodal Latent Space Exploration

Multimodal masked siamese network improves chest X-ray representation learning

Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

Multimodal risk prediction with physiological signals, medical images and clinical notes

Medical Vision-Language Pre-Training for Brain Abnormalities

Multimodal Representation Learning of Cardiovascular Magnetic Resonance Imaging