Machine learning to classify left ventricular hypertrophy using ECG feature extraction by variational autoencoder

Amulya Gupta,Christopher J. Harvey,Ashley DeBauge,Sumaiya Shomaji,Zijun Yao,Amit Noheria
DOI: https://doi.org/10.1101/2024.10.14.24315460
2024-10-15
Abstract:Background: Traditional ECG criteria for left ventricular hypertrophy (LVH) have low diagnostic yield. Machine learning (ML) can improve ECG classification. Methods: ECG summary features (rate, intervals, axis), R-wave, S-wave and overall-QRS amplitudes, and QRS/QRST voltage-time integrals (VTIs) were extracted from 12-lead, vectorcardiographic X-Y-Z-lead, and root-mean-square (3D) representative-beat ECGs. Latent features were extracted by variational autoencoder from X-Y-Z and 3D representative-beat ECGs. Logistic regression, random forest, light gradient boosted machine (LGBM), residual network (ResNet) and multilayer perceptron network (MLP) models using ECG features and sex, and a convolutional neural network (CNN) using ECG signals, were trained to predict LVH (left ventricular mass indexed in women >95g/m2, men >115g/m2) on 225,333 adult ECG-echocardiogram (within 45 days) pairs. AUROCs for LVH classification were obtained in a separate test set for individual ECG variables, traditional criteria and ML models. Results: In the test set (n=25,263), AUROC for LVH classification was higher for ML models using ECG features (LGBM 0.790, MLP 0.789, ResNet 0.788) as compared to the best individual variable (VTIQRS-3D 0.677), the best traditional criterion (Cornell voltage-duration product 0.647) and CNN using ECG signal (0.767). Among patients without LVH who had a follow-up echocardiogram >1 (closest to 5) years later, LGBM false positives, compared to true negatives, had a 2.63 (95% CI 2.01, 3.45)-fold higher risk for developing LVH (p<0.0001). Conclusions: ML models are superior to traditional ECG criteria to classify-and predict future-LVH. Models trained on extracted ECG features, including variational autoencoder latent variables, outperformed CNN directly trained on ECG signal.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the diagnostic performance of electrocardiogram (ECG) in the classification of left ventricular hypertrophy (LVH). Traditional ECG criteria are not effective in detecting LVH and have a low diagnosis rate. The author uses machine - learning (ML) techniques, especially extracting ECG features through variational auto - encoder (VAE), to improve the classification effect of LVH. Specifically, the research aims to: 1. **Evaluate the performance of traditional ECG criteria**: By comparing with machine - learning models, verify the limitations of traditional ECG criteria (such as Cornell voltage - time product, etc.) in the diagnosis of LVH. 2. **Develop and test new machine - learning models**: Use various features extracted from electrocardiograms (including latent variables generated by variational auto - encoders) to train different machine - learning models (such as logistic regression, random forest, light - gradient - boosting machine, residual network and multi - layer perceptron) to improve the classification accuracy of LVH. 3. **Explore the performance of different models in different subgroups**: Analyze the performance of these models in subgroups of different genders and different intraventricular conduction abnormalities (such as narrow QRS, typical right bundle - branch block, typical left bundle - branch block and intraventricular conduction delay). 4. **Evaluate the predictive ability of models for future LVH development**: By analyzing the false - positive results of models, evaluate the risk of these results developing into LVH in the future. Through the above methods, the paper aims to show the potential of machine - learning models in improving the diagnostic accuracy of LVH and provide support for future clinical applications.