Abstract:Cardiac disease evaluation depends on multiple diagnostic modalities: electrocardiogram (ECG) to diagnose abnormal heart rhythms, and imaging modalities such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT) and echocardiography to detect signs of structural abnormalities. Each of these modalities brings complementary information for a better diagnosis of cardiac dysfunction. However, training a machine learning (ML) model with data from multiple modalities is a challenging task, as it increases the dimension space, while keeping constant the number of samples. In fact, as the dimension of the input space increases, the volume of data required for accurate generalisation grows exponentially. In this work, we address this issue, for the application of Ventricular Arrhythmia (VA) prediction, based on the combined clinical and CT imaging features, where we constrained the learning process on medical images (CT) based on the prior knowledge acquired from clinical data. The VA classifier is fed with features extracted from a 3D myocardium thickness map (TM) of the left ventricle. The TM is generated by our pipeline from the imaging input and a Graph Convolutional Network is used as the feature extractor of the 3D TM. We introduce a novel Sequential Fusion method and evaluate its performance against traditional Early Fusion techniques and single-modality models. The crossvalidation results show that the Sequential Fusion model achieved the highest average scores of 80.7% $\pm$ 4.4 Sensitivity and 73.1% $\pm$ 6.0 F1 score, outperforming the Early Fusion model at 65.0% $\pm$ 8.9 Sensitivity and 63.1% $\pm$ 6.3 F1 score. Both fusion models achieved better scores than the single-modality models, where the average Sensitivity and F1 score are 62.8% $\pm$ 10.1; 52.1% $\pm$ 6.5 for the clinical data modality and 62.9% $\pm$ 6.3; 60.7% $\pm$ 5.3 for the medical images modality.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the prediction accuracy of ventricular arrhythmia (VA) by combining multi - modal data (clinical data and CT images). Specifically, the research aims to overcome the problem of insufficient sample size caused by the increase in data dimensions in multi - modal learning and proposes a new sequential fusion method to better integrate information from different modalities. ### Problem Background Cardiac disease assessment depends on multiple diagnostic methods: - Electrocardiogram (ECG) is used to diagnose abnormal heart rhythms. - Imaging examinations such as magnetic resonance imaging (MRI), computed tomography (CT) and echocardiography are used to detect structural abnormalities. These modalities provide complementary information, which is helpful for more accurate diagnosis of cardiac dysfunction. However, training machine - learning models with multi - modal data is a challenge, because as the dimension of the input space increases, the required sample size grows exponentially, while the actual sample size is usually fixed. This leads to the problem of decreased model generalization ability. ### Research Objectives To meet this challenge, this research focuses on the prediction task of ventricular arrhythmia (VA) and proposes the following objectives: 1. **Combine clinical data and CT images**: Use 3D myocardium thickness map (TM) as a feature source. 2. **Introduce a sequential fusion method**: Constrain the learning process of medical image data based on prior knowledge obtained from clinical data. 3. **Evaluate the effect of the new method**: Compare the performance of the sequential fusion method with that of the traditional early fusion and other single - modal models. ### Method Innovation The main innovation points of this research are: - **Sequential fusion technology**: By first training a classifier on low - dimensional data (clinical data) and then applying the learned knowledge as a constraint to high - dimensional data (medical images), the model's ability to understand complex data is improved. - **Handling the class imbalance problem**: By using a weighting strategy to give the minority class (VA+) a higher weight, the performance of the model on an imbalanced data set is made more robust. ### Experimental Results The experimental results show that the sequential fusion model is significantly superior to other methods in multiple performance indicators: - The average sensitivity is 80.7% ± 4.4%, and the F1 - score is 73.1% ± 6.0%, which are 15.7% and 9.0% higher than those of the early fusion model respectively. - Compared with other single - modal models, the sequential fusion model also shows better performance. ### Conclusion This research shows that multi - modal data can be effectively integrated through the sequential fusion method to improve the accuracy of ventricular arrhythmia prediction. Future research can further explore how to gradually constrain higher - dimensional data to enhance the robustness of the classification model and consider introducing more modal data to improve the prediction effect. ### Formula Display In this research, formulas are mainly used to calculate sample weights and loss functions: 1. **Prior modal weight calculation**: \[ \text{prior\_weight}_i=\frac{n}{k\cdot n_{s_i}} \] where $n$ is the total number of samples, $k = 2$ is the number of sets in this study, and $n_{s_i}$ is the number of samples in each set. 2. **Weighting strategy**: - Equal weighting strategy: \[ w_l=\alpha\cdot\text{class\_weight}_j+\beta\cdot\text{prior\_weight}_i \] - Stratified weighting strategy: \[ w_l = \begin{cases} \alpha\cdot\text{class\_weight}_j & \text{for } s_1 (i = 1)\\ \alpha\cdot\text{class\_weight}_j+\beta\cdot \end{cases} \]

Constraint-Based Model in Multimodal Learning to Improve Ventricular Arrhythmia Prediction

MetaVA: Curriculum Meta-learning and Pre-fine-tuning of Deep Neural Networks for Detecting Ventricular Arrhythmias based on ECGs

Multimodal risk prediction with physiological signals, medical images and clinical notes

Multimodal learning for fetal distress diagnosis using a multimodal medical information fusion framework

Research on Multimodal Fusion of Temporal Electronic Medical Records

Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability Detection

Multi-modality Multi-attention Network for Ventricular Arrhythmia Classification

Multimodal Learning To Improve Cardiac Late Mechanical Activation Detection From Cine MR Images

CNN-LSTM Based Multimodal MRI and Clinical Data Fusion for Predicting Functional Outcome in Stroke Patients

Full left ventricle quantification via deep multitask relationships learning

Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning

Integrating multimodal information in machine learning for classifying acute myocardial infarction

Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals

Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection

Multivariate Mixture Model for Cardiac Segmentation from Multi-Sequence MRI.

Improving Valvular Pathologies and Ventricular Dysfunction Diagnostic Efficiency Using Combined Auscultation and Electrocardiography Data: A Multimodal AI Approach

Towards a vision foundation model for comprehensive assessment of Cardiac MRI

ECG Heartbeat Classification Using Multimodal Fusion

Cardiac disease discrimination from 3D-convolutional kinematic patterns on cine-MRI sequences

Multimodal Fusion of Echocardiography and Electronic Health Records for the Detection of Cardiac Amyloidosis

Machine learning for pacemaker implantation prediction after TAVI using multimodal imaging data