Abstract:This study investigates the impact of the length of epochs and feature engineering techniques on Electroencephalography (EEG) classification tasks. The work is implemented on a dataset consisting of EEG data of 47 Parkinson’s Disease patients. Five epochs of 10 seconds exist for each patient. We partition each 10-second epoch into two 5-second epochs to compare the results of using different lengths of epochs. For feature engineering, we compare two techniques: a combination of ts-fresh and Boruta and Catch22. The former applies ts-fresh to extract an amount of time series features for each epoch of EEG data and uses Boruta to select a small set of significant features for the classification model. Catch22 is a collection of 22 canonical time-series characteristics. An automated random forest model tuned using Bayesian optimization, is applied for the classification tasks based on the features provided by the feature engineering step. Using the combination of ts-fresh and boruta shows similar performances for the 10-second and 5-second epochs data. However, when we conduct experiments using the average of 5 epochs, 10-second data presents better performance with an F1-score of 92% . While for the experiments using individual epochs (e.g., conducting five independent experiments using one epoch), 5-second data obtains better performance with a maximum F1-score of 96 . 5% . In addition, we conduct two experiments on Catch22, namely the Catch22-Compact method (i.e., selecting 22 out of 5*22 features) and the Catch22-Comprehensive method (using 110 features). The Catch22-Compact method obtains a maximal F1-score of 97% when performing on individual epochs and an F1-score of 93 . 4% for the average of 5 epochs for 10-second data, and it obtains a maximum F1-score of 99 . 3% and 96 . 5% for the same settings of 5-second data. In conclusion, this study demonstrates that classification performance depends on feature engineering techniques and the EEG epoch length. The Catch22-Compact method is the best compared to the other tested feature engineering methods across all data settings in this thesis.

Master Computer Science Investigating the epoch size and feature engineering for Automated Machine Learning in EEG data analysis

Automatic Classification of Sleep Stages Based on Raw Single-Channel EEG

Classification of EEG data using machine learning techniques

Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach

Schizophrenia diagnosis based on diverse epoch size resting-state EEG using machine learning

Effective and Extensible Feature Extraction Method Using Genetic Algorithm-Based Frequency-Domain Feature Search for Epileptic EEG Multi-classification

Detection of Unfocused EEG Epochs by the Application of Machine Learning Algorithm

How Much Data is Enough? Optimization of Data Collection for Artifact Detection in EEG Recordings

A Data Driven Approach for Resting-state EEG signal Classification of Schizophrenia with Control Participants using Random Matrix Theory

Entropy-based machine learning model for diagnosis and monitoring of Parkinson's Disease in smart IoT environment

Automatic diagnostics of electroencephalography pathology based on multi-domain feature fusion

Computational EEG in Personalized Medicine: A study in Parkinson's Disease

A Comparative Study on Feature Extraction Techniques for the Discrimination of Frontotemporal Dementia and Alzheimer's Disease with Electroencephalography in Resting-State Adults

The more, the better? Evaluating the role of EEG preprocessing for deep learning applications

Multi-Scale Feature and Multi-Channel Selection toward Parkinson's Disease Diagnosis with EEG

Two Heads are Better than One: A Bio-inspired Method for Improving Classification on EEG-ET Data

Seizure Type Classification using EEG signals and Machine Learning: Setting a benchmark

Understanding Learning from EEG Data: Combining Machine Learning and Feature Engineering Based on Hidden Markov Models and Mixed Models

Ensemble Fusion Models Using Various Strategies and Machine Learning for EEG Classification

Classification of Parkinson's disease EEG signals using 2D-MDAGTS model and multi-scale fuzzy entropy