Abstract:Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report, and we conduct extensive experiments to benchmark MEIT with nine open-source LLMs using more than 800,000 ECG reports. MEIT's results underscore the superior performance of instruction-tuned LLMs, showcasing their proficiency in quality report generation, zero-shot capabilities, and resilience to signal perturbation. These findings emphasize the efficacy of our MEIT framework and its potential for real-world clinical application.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the problem of automatically generating electrocardiogram (ECG) reports. Specifically, current research mainly focuses on using ECG data to classify heart conditions, while ignoring the automatic generation of ECG reports, which is both time - consuming and requires clinical expertise. To address this challenge, the paper proposes a multimodal ECG instruction - tuning framework (MEIT), which is the first attempt to use large - language models (LLMs) and multimodal instructions to generate ECG reports. ### Main Problems 1. **Automatically Generating ECG Reports**: Although existing research has made progress in classifying heart conditions from ECG data, there are still deficiencies in automatic report generation. The method proposed in the paper aims to enable LLMs to generate high - quality ECG reports through multimodal instruction - tuning. 2. **Fusion of Multimodal Data**: Semantic alignment between ECG signals and text reports is a key issue. The paper proposes an effective attention - mechanism - based fusion method, enabling LLMs to understand ECG signals and generate corresponding reports. 3. **Zero - Shot Learning Ability**: ECG signals may vary between different datasets and devices. The paper evaluates the performance of the MEIT framework in zero - shot learning tasks and verifies its generalization ability on unseen datasets. 4. **Robustness Analysis**: In actual clinical settings, ECG signals may be subject to noise interference. The paper tests the robustness of the MEIT framework in the case of signal perturbations by adding Gaussian noise. ### Solutions 1. **Multimodal ECG Instruction - Tuning Framework (MEIT)**: - **Data Preparation**: Constructed a multimodal instruction dataset containing ECG recordings, human instructions, and paired reports. - **Model Architecture**: Designed a multimodal LLM, aligning ECG signals with text representations through a lightweight attention - mechanism - fusion module. - **Instruction Tuning**: Through instruction tuning, the model can generate professional - level reports under different prompts. 2. **Benchmark Testing**: - **Dataset**: Used two large - scale ECG datasets (PTB - XL and MIMIC - IV - ECG). - **Evaluation Tasks**: Include three tasks: report - generation quality, zero - shot learning ability, and signal - perturbation robustness. - **Evaluation Metrics**: Used multiple natural - language - generation evaluation metrics, such as BLEU, METEOR, ROUGE, CIDEr - D, and BERTScore. ### Experimental Results - **Report - Generation Quality**: The MEIT framework is significantly superior to small - scale language models on multiple evaluation metrics and performs well among large - language models. - **Zero - Shot Learning Ability**: The instruction - tuned model still has good generalization ability on unseen datasets. - **Robustness**: Even in a high - noise environment, the MEIT framework can still generate high - quality reports, demonstrating its strong robustness. ### Conclusion The MEIT framework provides an effective method for automated ECG - report generation, which not only improves the quality of report generation but also enhances the model's generalization ability across different datasets and noise environments. This lays the foundation for future research on medical - signal - to - text generation.

MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report

ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

De-biased Multimodal Electrocardiogram Analysis

Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning

MetaVA: Curriculum Meta-learning and Pre-fine-tuning of Deep Neural Networks for Detecting Ventricular Arrhythmias based on ECGs

Cross-modal multiscale multi-instance learning for long-term ECG classification

Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning

Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

Large language models enabled multiagent ensemble method for efficient EHR data labeling

Automated Transformation of Unstructured Cardiovascular Diagnostic Reports into Structured Datasets Using Sequentially Deployed Large Language Models

MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis

Transfer Knowledge from Natural Language to Electrocardiography: Can We Detect Cardiovascular Disease Through Language Models?

ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases.

Dia-LLaMA: Towards Large Language Model-driven CT Report Generation

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

A Multi-Task Group Bi-LSTM Networks Application on Electrocardiogram Classification