Reproducible Python workflow for multi‐site resting‐state EEG analysis: From raw data to group classification

Alberto Jaramillo‐Jimenez,Yorguin Jose Mantilla‐Ramos,Diego Tovar,John F Ochoa,Kolbjørn Brønnick,Dag Aarsland,Laura Bonanni
DOI: https://doi.org/10.1002/alz.076353
2023-12-01
Abstract:Abstract Background Among promising markers for neurodegenerative disorders, electroencephalogram (EEG) represents a non‐invasive and potentially portable alternative. Standardized methods for preprocessing may contribute to less inter‐observer variability, while pooling multi‐centric EEG data may provide more generalizable results. We aim to formulate a reproducible pipeline for group‐level classification (of Parkinson’s Disease ‐PD‐) based on various features of resting‐state EEG recordings (rsEEG) collected in multi‐site studies. Methods We used four datasets (acquired in Colombia, Finland, and the USA) consisting of 169 subjects (84 PD; 85 non‐PD healthy controls). Our workflow included rsEEG raw files standardization, preprocessing, spectral bandpowers and multiple entropy features extraction on sensor space, harmonization of multi‐site features, and group‐level multiclass classification using supervised machine learning methods. Center/scanner‐related variability effects in the extracted features were harmonized using the ComBat approach and its variants 1 . Finally, sequential feature selection (SFS) and binary classification (Non‐PD vs. PD) was conducted using XGradientBoosting 2 . The full workflow is detailed in Figure 1. Results t‐distributed stochastic neighbor embedded (tSNE) plots showed that center‐related variability in spectral features was reduced after harmonization using the ComBat method and its bootstrapped variant, Figure 2. SFS of rsEEG features was performed on 224 harmonized features, showing the highest performance when alpha, pre‐alpha, and slow‐theta bands were considered for subsequent classification. The XGradientBoosting classifier showed an AUC of 0.826 in separating PD from non‐PD, Figure 3. Conclusions In multicenter studies, the harmonization of rsEEG features at the sensor space using ComBat can contribute to controlling batch effects. XGradientBoosting with rsEEG features and SFS performed well in classifying non‐PD vs. PD. Intended to be a practical tool for group‐level analysis of neurodegenerative disorders based on sensor space rsEEG features, our workflow is openly available at (https://github.com/alberto‐jj/raw_to_classification) and can be easily adapted to multi‐class classification and regression tasks. References 1. Da‐ano, R., et al. Sci. Rep . 10 , 10 (2020). 2. Moguilner, S., et al. J. Neural Eng . 19 , 046048 (2022).
clinical neurology
What problem does this paper attempt to address?