Continental generalization of an AI system for clinical seizure recognition

Yikai Yang,Nhan Duy Truong,Christina Maher,Armin Nikpour,Omid Kavehei
DOI: https://doi.org/10.1101/2021.03.07.433990
2021-03-08
Abstract:Abstract Background Electroencephalogram (EEG) monitoring and objective seizure identification is an essential clinical investigation for some patients with epilepsy. Accurate annotation is done through a time-consuming process by EEG specialists. Computer-assisted systems for seizure detection currently lack extensive clinical utility due to retrospective, patient-specific, and/or irreproducible studies that result in low sensitivity or high false positives in clinical tests. We aim to significantly reduce the time and resources on data annotation by demonstrating a continental generalization of seizure detection that balances sensitivity and specificity. Methods This is a prospective inference test of artificial intelligence on nearly 14,590 hours of adult EEG data from patients with epilepsy between 2011 and 2019 in a hospital in Sydney, Australia. The inference set includes patients with different types and frequencies of seizures across a wide range of ages and EEG recording hours. The artificial intelligence (AI) is a convolutional long short-term memory network that is trained on a USA-based dataset. The Australian set is about 16 times larger than the US training dataset with very long interictal periods (between seizures), which is way more realistic than the training set and makes our false positives highly reliable. We validated our inference model in an AI-assisted mode with a human expert arbiter and a result review panel of expert neurologists and EEG specialists on 66 sessions to demonstrate achievement of the same performance with over an order-of-magnitude reduction in time. Findings Our inference on 1,006 EEG recording sessions on the Australian dataset achieved 76.68% with nearly 56 [0, 115] false alarms per 24 hours on average, against legacy ground-truth annotations by human experts, conducted independently over nine years. Our pilot test of 66 sessions with a human arbiter, and reviewed ground truth by a panel of experts, confirmed an identical human performance of 92.19% with an AI-assisted system, while the time requirements reduce significantly from 90 to 7.62 minutes on average. Interpretation Accurate and objective seizure counting is an important factor in epilepsy. An AI-assisted system can help improve efficiency and accuracy alongside human experts, particularly in low and middle-income countries with limited expert human resources. Fundings SOAR Fellowship from The University of Sydney, a Microsoft AI for Accessibility grant, and a Research Training Program (RTP) support provided by the Australian Government. Research in context Evidence before this study During the development of our artificial intelligence (AI) system, we did a systematic review of the scientific literature with search via PubMed for research articles published on seizure detection with the following inclusion criteria: (1) Tests or inference evaluation is conducted on large-scale clinical EEG data; (2) Generalization is attempted or potentials for generalization is considered, e.g., in commercialized tools; (3) Seizure detection delay and real-time (aka. online) operation were not considered critical in this context as long as the test was conducted on raw EEG data. Note that ICU seizure detection or portable seizure alert systems are relying on detection delay and real-time needs. Our keywords include “prospective seizure detection”, “automated seizure detection”, “non-patient specific seizure detection”, “seizure detection on continuous EEG”, and “deep learning-based seizure detection” and “machine learning-based seizure detection”. We found that the only two categories of works meet our criteria: two research papers published in 2020 and works published by commercial tools developers. We cited a recent review of 89 deep learning-based seizure detection, all of which are retrospective. One work from Stanford reported seizure detection on all ages (pediatric to adult ages) using post-acquisition EEG recordings and provided an avenue for independent evaluation by providing a test on a publicly available Temple University Hospital (TUH) EEG dataset. The other work pivoted on algorithmic-assisted real-time seizure risk monitoring in continuous EEG in neonatal intensive care unit (NICU) with 128 neonates (32 with seizures) showing about 20% improvement in seizure identification over 130 neonates (38 with seizures) with no algorithmic assistance. Commercial tools we studied are Encevis (EpiScan), Besa, and Persyst. There is a recent comparative study on these tools on 81 patients. Encevis is reported as the best performing tool, and hence we provided a comparative study with Encevis ver. 1.9.2. Encevis is also the only tool that provided an avenue for comparative study on publicly available EEG data. The Stanford work, published in 2020, confirms many false positives with Persyst 13. We excluded our tests on Persyst 14 as it highly under-performed relative to Encevis. Only Stanford’s work provides code availability. We compared our results with Stanford’s work outcome and provided pilot test results with the Encevis (EpiScan) tool on the Australian dataset, which shows a considerably lower sensitivity. Added value of this study To the best of our knowledge, the current study is the first continental generalization that demonstrates the potential to achieve an expert human-level seizure recognition rate in a clinical setting and in just a fraction of time. The two datasets used in this study are recorded with different infrastructure, which adds to the independence of inference from hardware types and improves clinical utility. This is particularly important as 80% of patients with epilepsy live in low and middle-income countries with limited resources, particularly EEG specialists and neurologists. Implications of all the available evidence Our results support the potential benefits of deep learning AI in clinical settings for seizure recognition and its contribution to significant sensitivity over available solutions. Our AI-assisted system achieves more than a ten-fold increase in time efficiency and reports identical performance to human experts for EEG interpretation with access to great neurophysiology support and auxiliary data. Our findings, particularly our tests on an available commercial tool, recommend that the evaluation, test, or inference in AI systems be performed on different datasets, with diverse infrastructures, and on large-scale and realistic sets with long interictal periods.
What problem does this paper attempt to address?