Abstract:Background: In intensive care units (ICUs), critically ill patients are monitored with electroencephalography (EEG) to prevent serious brain injury. EEG monitoring is constrained by clinician availability, and EEG interpretation can be subjective and prone to interobserver variability. Automated deep-learning systems for EEG could reduce human bias and accelerate the diagnostic process. However, existing uninterpretable (black-box) deep-learning models are untrustworthy, difficult to troubleshoot, and lack accountability in real-world applications, leading to a lack of both trust and adoption by clinicians. Methods: We developed an interpretable deep-learning system that accurately classifies six patterns of potentially harmful EEG activity - seizure, lateralized periodic discharges (LPDs), generalized periodic discharges (GPDs), lateralized rhythmic delta activity (LRDA), generalized rhythmic delta activity (GRDA), and other patterns - while providing faithful case-based explanations of its predictions. The model was trained on 50,697 total 50-second continuous EEG samples collected from 2711 patients in the ICU between July 2006 and March 2020 at Massachusetts General Hospital. EEG samples were labeled as one of the six EEG patterns by 124 domain experts and trained annotators. To evaluate the model, we asked eight medical professionals with relevant backgrounds to classify 100 EEG samples into the six pattern categories - once with and once without artificial intelligence (AI) assistance - and we assessed the assistive power of this interpretable system by comparing the diagnostic accuracy of the two methods. The model's discriminatory performance was evaluated with area under the receiver-operating characteristic curve (AUROC) and area under the precision-recall curve. The model's interpretability was measured with task-specific neighborhood agreement statistics that interrogated the similarities of samples and features. In a separate analysis, the latent space of the neural network was visualized by using dimension reduction techniques to examine whether the ictal-interictal injury continuum hypothesis, which asserts that seizures and seizure-like patterns of brain activity lie along a spectrum, is supported by data. Results: The performance of all users significantly improved when provided with AI assistance. Mean user diagnostic accuracy improved from 47 to 71% (P<0.04). The model achieved AUROCs of 0.87, 0.93, 0.96, 0.92, 0.93, and 0.80 for the classes seizure, LPD, GPD, LRDA, GRDA, and other patterns, respectively. This performance was significantly higher than that of a corresponding uninterpretable black-box model (with P<0.0001). Videos traversing the ictal-interictal injury manifold from dimension reduction (a two-dimensional representation of the original high-dimensional feature space) give insight into the layout of EEG patterns within the network's latent space and illuminate relationships between EEG patterns that were previously hypothesized but had not yet been shown explicitly. These results indicate that the ictal-interictal injury continuum hypothesis is supported by data. Conclusions: Users showed significant pattern classification accuracy improvement with the assistance of this interpretable deep-learning model. The interpretable design facilitates effective human-AI collaboration; this system may improve diagnosis and patient care in clinical settings. The model may also provide a better understanding of how EEG patterns relate to each other along the ictal-interictal injury continuum. (Funded by the National Science Foundation, National Institutes of Health, and others.).

Data leakage in deep learning studies of translational EEG

Data leakage in deep learning studies of translational EEG

How You Split Matters: Data Leakage and Subject Characteristics Studies in Longitudinal Brain MRI Analysis

Inflated prediction accuracy of neuropsychiatric biomarkers caused by data leakage in feature selection

Assisting Schizophrenia Diagnosis Using Clinical Electroencephalography and Interpretable Graph Neural Networks: a Real-World and Cross-Site Study

Data leakage inflates prediction performance in connectome-based machine learning models

The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study

Hazards of data leakage in machine learning: a study on classification of breast cancer using deep neural networks

Brain Age Prediction/Classification through Recurrent Deep Learning with Electroencephalogram Recordings of Seizure Subjects

Data-driven retrieval of population-level EEG features and their role in neurodegenerative diseases

Differentiating Ischemic Stroke Patients from Healthy Subjects Using a Large-Scale, Retrospective EEG Database and Machine Learning Methods

Deep learning-based electroencephalography analysis: a systematic review

Precise Discrimination for Multiple Etiologies of Dementia Cases Based on Deep Learning with Electroencephalography

Geometric Deep Learning for Subject Independent Epileptic Seizure Prediction Using Scalp EEG Signals

Stabilizing Subject Transfer in EEG Classification with Divergence Estimation

Improving Clinician Performance in Classifying EEG Patterns on the Ictal-Interictal Injury Continuum Using Interpretable Machine Learning

Bridging the gap between patient-specific and patient-independent seizure prediction via knowledge distillation

Moving the field forward: detection of epileptiform abnormalities on scalp electroencephalography using deep learning-clinical application perspectives

SEEG-Net: An explainable and deep learning-based cross-subject pathological activity detection method for drug-resistant epilepsy

Deep Learning in current Neuroimaging: a multivariate approach with power and type I error control but arguable generalization ability

Towards Early Diagnosis of Epilepsy from EEG Data