Unveiling AI-ECG using Generative Counterfactual XAI Framework
Jang,J.-H.,Jo,Y.-Y.,Kang,S.,Son,J. M.,Lee,H. S.,Kwon,J.-m.,Lee,M. S.
DOI: https://doi.org/10.1101/2024.09.29.24314144
2024-10-01
MedRxiv
Abstract:Background: The application of artificial intelligence (AI) to electrocardiograms (ECGs) has shown great promise in the screening and diagnosis of cardiovascular diseases, often matching or surpassing human expertise. However, the black-box nature of deep learning models poses significant challenges to their clinical adoption. While Explainable AI (XAI) techniques, such as Saliency Maps, have attempted to address these issues, they have not been able to provide clear, clinically relevant explanations. We developed the Generative Counterfactual ECG XAI (GCX) framework, which uses counterfactual scenarios to explain AI predictions, enhancing interpretability and aligning with medical knowledge. Methods: We designed a study to validate the GCX framework by applying it to eight AI-ECG models, including those focused on regression of six ECG features, potassium level regression, and atrial fibrillation (AF) classification. PTB-XL and MIMIC-IV were used to develop and test. GCX generated counterfactual (CF) ECGs to visualize how changes in the ECG relate to AI-ECG predictions. We visualized CF ECGs for qualitative comparisons, statistically compared ECG features, and validated these findings with conventional ECG knowledge. Results: The GCX framework successfully generated interpretable ECGs aligned with clinical knowledge, particularly in the context of ECG feature regression, potassium level regression, and AF classification. For ECG feature regression, GCX demonstrated clear and consistent changes in features, reflecting the corresponding morphological alterations. CF ECGs for hyperkalemia showed a prolonged PR, discernible P wave, increased T wave amplitude, and widened QRS complex, whereas those for AF demonstrated the disappearance of the P wave and irregular rhythms. Conclusion: The GCX framework enhances the interpretability of AI-ECG models, offering clear relevant explanations for AI predictions. This approach holds substantial potential for improving the trust and utility of AI in clinical practice, although further validation across diverse datasets is required.