High quality ECG dataset based on MIT-BIH recordings for improved heartbeats classification

Ahmed.S Benmessaoud,Farida Medjani,Yahia Bousseloub,Khalid Bouaita,Dhia Benrahem,Tahar Kezai
2024-10-28
Abstract:Electrocardiogram (ECG) is a reliable tool for medical professionals to detect and diagnose abnormal heart waves that may cause cardiovascular diseases. This paper proposes a methodology to create a new high-quality heartbeat dataset from all 48 of the MIT-BIH recordings. The proposed approach computes an optimal heartbeat size, by eliminating outliers and calculating the mean value over 10-second windows. This results in independent QRS-centered heartbeats avoiding the mixing of successive heartbeats problem. The quality of the newly constructed dataset has been evaluated and compared with existing datasets. To this end, we built and trained a PyTorch 1-D Resnet architecture model that achieved 99.24\% accuracy with a 5.7\% improvement compared to other methods. Additionally, downsampling the dataset has improved the model's execution time by 33\% and reduced 3x memory usage.
Signal Processing,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the accuracy of electrocardiogram (ECG) heartbeat classification. Specifically, the authors aim to create a high - quality heartbeat data set to improve the performance of existing data sets in heartbeat classification. The following are the specific problems and solutions proposed in the paper: ### 1. **Limitations of Existing Data Sets** - **Mixed - heartbeat Problem**: Existing data sets may mix adjacent heartbeat signals when extracting heartbeats, resulting in a decline in data quality. For example, Acharya et al. used a fixed 260 samples as the heartbeat length, but this method ignores the actual length differences of different heartbeats and may lead to information loss or aliasing. - **Influence of Outliers**: There are some outliers (such as very short or very long heartbeat intervals) in the MIT - BIH data set, and these outliers will affect the training effect of the model. ### 2. **Data Quality Problems** - **Insufficient Data Pre - processing**: Existing heartbeat data sets do not fully consider how to eliminate outliers and optimize the selection of heartbeat length in the pre - processing stage, which directly affects the performance of subsequent classification tasks. ### 3. **Model Training Efficiency** - **High Consumption of Computing Resources**: Deep - learning models require a large amount of data and computing resources, especially when processing ECG signals with a high sampling rate, and the consumption of memory and computing time is large. ### Solutions: To solve the above problems, the authors proposed the following methods: 1. **Eliminate Outliers**: Identify and remove abnormal heartbeats through the IQR (interquartile range) method to ensure that the data set only contains high - quality heartbeat signals. 2. **Optimize Heartbeat Length**: Dynamically calculate the optimal heartbeat length according to the mean RR interval within each 10 - second window, ensure that each heartbeat is centered on the QRS wave, and avoid aliasing of adjacent heartbeats. 3. **Down - sampling and Normalization**: Reduce the original 360 Hz sampling rate to 120 Hz and perform z - score normalization on the data, thereby reducing memory usage and accelerating model training. 4. **Model Architecture Design**: Developed a 1 - D ResNet architecture model based on PyTorch. This model has a depth of 34 layers, contains 3 residual blocks, can effectively extract the features of ECG signals, and achieves a classification accuracy of 99.24%. ### Summary: The core objective of this paper is to significantly improve the accuracy of ECG heartbeat classification by creating a high - quality heartbeat data set, combining effective pre - processing techniques and efficient deep - learning models. The experimental results show that compared with existing methods, this method has significant improvements in classification accuracy and model training efficiency.