Master Computer Science Investigating the epoch size and feature engineering for Automated Machine Learning in EEG data analysis

Furong Ye,Konstantina Papada,Victor Geraedts
Abstract:This study investigates the impact of the length of epochs and feature engineering techniques on Electroencephalography (EEG) classification tasks. The work is implemented on a dataset consisting of EEG data of 47 Parkinson’s Disease patients. Five epochs of 10 seconds exist for each patient. We partition each 10-second epoch into two 5-second epochs to compare the results of using different lengths of epochs. For feature engineering, we compare two techniques: a combination of ts-fresh and Boruta and Catch22. The former applies ts-fresh to extract an amount of time series features for each epoch of EEG data and uses Boruta to select a small set of significant features for the classification model. Catch22 is a collection of 22 canonical time-series characteristics. An automated random forest model tuned using Bayesian optimization, is applied for the classification tasks based on the features provided by the feature engineering step. Using the combination of ts-fresh and boruta shows similar performances for the 10-second and 5-second epochs data. However, when we conduct experiments using the average of 5 epochs, 10-second data presents better performance with an F1-score of 92% . While for the experiments using individual epochs (e.g., conducting five independent experiments using one epoch), 5-second data obtains better performance with a maximum F1-score of 96 . 5% . In addition, we conduct two experiments on Catch22, namely the Catch22-Compact method (i.e., selecting 22 out of 5*22 features) and the Catch22-Comprehensive method (using 110 features). The Catch22-Compact method obtains a maximal F1-score of 97% when performing on individual epochs and an F1-score of 93 . 4% for the average of 5 epochs for 10-second data, and it obtains a maximum F1-score of 99 . 3% and 96 . 5% for the same settings of 5-second data. In conclusion, this study demonstrates that classification performance depends on feature engineering techniques and the EEG epoch length. The Catch22-Compact method is the best compared to the other tested feature engineering methods across all data settings in this thesis.
Computer Science,Medicine
What problem does this paper attempt to address?