MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

Zhongwei Wan,Che Liu,Xin Wang,Chaofan Tao,Hui Shen,Zhenwu Peng,Jie Fu,Rossella Arcucci,Huaxiu Yao,Mi Zhang
2024-06-18
Abstract:Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions. To facilitate future research, we establish a benchmark to evaluate MEIT with various LLMs backbones across two large-scale ECG datasets. Our approach uniquely aligns the representations of the ECG signal and the report, and we conduct extensive experiments to benchmark MEIT with nine open-source LLMs using more than 800,000 ECG reports. MEIT's results underscore the superior performance of instruction-tuned LLMs, showcasing their proficiency in quality report generation, zero-shot capabilities, and resilience to signal perturbation. These findings emphasize the efficacy of our MEIT framework and its potential for real-world clinical application.
Computation and Language,Machine Learning,Signal Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the problem of automatically generating electrocardiogram (ECG) reports. Specifically, current research mainly focuses on using ECG data to classify heart conditions, while ignoring the automatic generation of ECG reports, which is both time - consuming and requires clinical expertise. To address this challenge, the paper proposes a multimodal ECG instruction - tuning framework (MEIT), which is the first attempt to use large - language models (LLMs) and multimodal instructions to generate ECG reports. ### Main Problems 1. **Automatically Generating ECG Reports**: Although existing research has made progress in classifying heart conditions from ECG data, there are still deficiencies in automatic report generation. The method proposed in the paper aims to enable LLMs to generate high - quality ECG reports through multimodal instruction - tuning. 2. **Fusion of Multimodal Data**: Semantic alignment between ECG signals and text reports is a key issue. The paper proposes an effective attention - mechanism - based fusion method, enabling LLMs to understand ECG signals and generate corresponding reports. 3. **Zero - Shot Learning Ability**: ECG signals may vary between different datasets and devices. The paper evaluates the performance of the MEIT framework in zero - shot learning tasks and verifies its generalization ability on unseen datasets. 4. **Robustness Analysis**: In actual clinical settings, ECG signals may be subject to noise interference. The paper tests the robustness of the MEIT framework in the case of signal perturbations by adding Gaussian noise. ### Solutions 1. **Multimodal ECG Instruction - Tuning Framework (MEIT)**: - **Data Preparation**: Constructed a multimodal instruction dataset containing ECG recordings, human instructions, and paired reports. - **Model Architecture**: Designed a multimodal LLM, aligning ECG signals with text representations through a lightweight attention - mechanism - fusion module. - **Instruction Tuning**: Through instruction tuning, the model can generate professional - level reports under different prompts. 2. **Benchmark Testing**: - **Dataset**: Used two large - scale ECG datasets (PTB - XL and MIMIC - IV - ECG). - **Evaluation Tasks**: Include three tasks: report - generation quality, zero - shot learning ability, and signal - perturbation robustness. - **Evaluation Metrics**: Used multiple natural - language - generation evaluation metrics, such as BLEU, METEOR, ROUGE, CIDEr - D, and BERTScore. ### Experimental Results - **Report - Generation Quality**: The MEIT framework is significantly superior to small - scale language models on multiple evaluation metrics and performs well among large - language models. - **Zero - Shot Learning Ability**: The instruction - tuned model still has good generalization ability on unseen datasets. - **Robustness**: Even in a high - noise environment, the MEIT framework can still generate high - quality reports, demonstrating its strong robustness. ### Conclusion The MEIT framework provides an effective method for automated ECG - report generation, which not only improves the quality of report generation but also enhances the model's generalization ability across different datasets and noise environments. This lays the foundation for future research on medical - signal - to - text generation.