Abstract:Deep learning models need a sufficient amount of data in order to be able to find the hidden patterns in it. It is the purpose of generative modeling to learn the data distribution, thus allowing us to sample more data and augment the original dataset. In the context of physiological data, and more specifically electrocardiogram (ECG) data, given its sensitive nature and expensive data collection, we can exploit the benefits of generative models in order to enlarge existing datasets and improve downstream tasks, in our case, classification of heart rhythm. In this work, we explore the usefulness of synthetic data generated with different generative models from Deep Learning namely Diffweave, Time-Diffusion and Time-VQVAE in order to obtain better classification results for two open source multivariate ECG datasets. Moreover, we also investigate the effects of transfer learning, by fine-tuning a synthetically pre-trained model and then progressively adding increasing proportions of real data. We conclude that although the synthetic samples resemble the real ones, the classification improvement when simply augmenting the real dataset is barely noticeable on individual datasets, but when both datasets are merged the results show an increase across all metrics for the classifiers when using synthetic samples as augmented data. From the fine-tuning results the Time-VQVAE generative model has shown to be superior to the others but not powerful enough to achieve results close to a classifier trained with real data only. In addition, methods and metrics for measuring closeness between synthetic data and the real one have been explored as a side effect of the main research questions of this study.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two major challenges faced by electrocardiogram (ECG) data in deep - learning model training: **data scarcity and class imbalance**. Specifically: 1. **Data Scarcity**: - Psychophysiological data, especially electrocardiogram data, has very limited publicly available datasets due to its sensitivity and high collection cost. - Deep - learning models usually require a large amount of data to find the hidden patterns and conduct effective training. However, the existing public datasets are often insufficient to meet these requirements. 2. **Class Imbalance**: - In real - world electrocardiogram data, the number of samples in some classes (such as specific types of arrhythmia) is far less than that in the normal class. - This imbalance will cause machine - learning algorithms to be biased towards the majority class during the training process, thus affecting the model's ability to recognize the minority class. To solve these problems, the author explores the method of using generative models to generate synthetic electrocardiogram data and evaluates the effect of these synthetic data in classification tasks. Specifically, the paper discusses the following generative models: - **Diffwave** - **Time - Diffusion** - **Time - VQV AE** In addition, the paper also studies the effect of transfer learning. By pre - training the generative model and gradually adding real data for fine - tuning, it evaluates the impact of synthetic data on classification performance. ### Main contributions of the paper 1. **Selection and evaluation of generative models**: - Three different generative models (Diffwave, Time - Diffusion, Time - VQV AE) are selected and trained and evaluated on two publicly available multivariate electrocardiogram datasets (PTB - XL and Chapman). 2. **Quality evaluation of synthetic data**: - The similarity between synthetic data and real data is evaluated by multiple methods and metrics to ensure that the generated synthetic data has high quality. 3. **Application of transfer learning**: - The effect of pre - training with generative models and fine - tuning on real data is explored, and the impact of different proportions of real data on classification performance is evaluated. 4. **Construction and evaluation of a comprehensive dataset**: - The two datasets are combined to construct a more abundant comprehensive dataset, and on this basis, a new generative model is trained to further verify the effectiveness of synthetic data in classification tasks. Through these methods, the paper aims to improve the performance of electrocardiogram classification tasks, especially in the case of data scarcity and class imbalance.

Synthetic ECG Generation for Data Augmentation and Transfer Learning in Arrhythmia Classification

Arrhythmia Classification using CGAN-augmented ECG Signals

Generative adversarial networks in electrocardiogram synthesis: Recent developments and challenges

The Effect of Data Augmentation on Classification of Atrial Fibrillation in Short Single-Lead ECG Signals Using Deep Neural Networks

ECG Synthesis via Diffusion-Based State Space Augmented Transformer

Improving ECG Classification Using Generative Adversarial Networks

Data Augmentation with GAN increases the Performance of Arrhythmia Classification for an Unbalanced Dataset

Feature matching based ECG generative network for arrhythmia event augmentation

ECG data enhancement method using generate adversarial networks based on BiLSTM and CBAM

MetaVA: Curriculum Meta-learning and Pre-fine-tuning of Deep Neural Networks for Detecting Ventricular Arrhythmias based on ECGs

Generative Deep Learning and Signal Processing for Data Augmentation of Cardiac Auscultation Signals: Improving Model Robustness Using Synthetic Audio

ECG Heartbeat Classification Using Deep Transfer Learning with Convolutional Neural Network and STFT Technique

Training neural networks with synthetic electrocardiograms

Generalising electrocardiogram detection and delineation: training convolutional neural networks with synthetic data augmentation

Arrhythmias Classification Using Short-Time Fourier Transform and GAN Based Data Augmentation

Domain randomization using synthetic electrocardiograms for training neural networks

ECGAN: Self-supervised generative adversarial network for electrocardiography

Improving Deep Learning-based Cardiac Abnormality Detection in 12-Lead ECG with Data Augmentation

DiffECG: A Versatile Probabilistic Diffusion Model for ECG Signals Synthesis

Data Augmentation for Deep Learning-Based ECG Analysis

GAN-SkipNet: A Solution for Data Imbalance in Cardiac Arrhythmia Detection Using Electrocardiogram Signals from a Benchmark Dataset