Interpretable Classification of Early Stage Parkinson's Disease from EEG

Amarpal Sahota,Amber Roguski,Matthew W. Jones,Michal Rolinski,Alan Whone,Raul Santos-Rodriguez,Zahraa S. Abdallah
2023-12-08
Abstract:Detecting Parkinson's Disease in its early stages using EEG data presents a significant challenge. This paper introduces a novel approach, representing EEG data as a 15-variate series of bandpower and peak frequency values/coefficients. The hypothesis is that this representation captures essential information from the noisy EEG signal, improving disease detection. Statistical features extracted from this representation are utilised as input for interpretable machine learning models, specifically Decision Tree and AdaBoost classifiers. Our classification pipeline is deployed within our proposed framework which enables high-importance data types and brain regions for classification to be identified. Interestingly, our analysis reveals that while there is no significant regional importance, the N1 sleep data type exhibits statistically significant predictive power (p < 0.01) for early-stage Parkinson's Disease classification. AdaBoost classifiers trained on the N1 data type consistently outperform baseline models, achieving over 80% accuracy and recall. Our classification pipeline statistically significantly outperforms baseline models indicating that the model has acquired useful information. Paired with the interpretability (ability to view feature importance's) of our pipeline this enables us to generate meaningful insights into the classification of early stage Parkinson's with our N1 models. In Future, these models could be deployed in the real world - the results presented in this paper indicate that more than 3 in 4 early-stage Parkinson's cases would be captured with our pipeline.
Neurons and Cognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accurately classify Parkinson's disease in its early stage using electroencephalogram (EEG) data. Specifically, the authors propose a new method to represent EEG data as a sequence of 15 - variable band - power and peak - frequency values/coefficients, aiming to capture key information from noisy EEG signals to improve the accuracy of disease detection. In addition, they also use the statistical features extracted from this representation as input to construct interpretable machine - learning models, especially decision - tree and AdaBoost classifiers, for the classification of early - stage Parkinson's disease. The paper pays special attention to the data type of N1 sleep stage and finds that it has significant predictive ability in the classification of early - stage Parkinson's disease ($p < 0.01$), and the AdaBoost classifier based on the N1 data type outperforms the baseline model, achieving an accuracy and recall rate of over 80%. ### Specific problems solved by the paper: 1. **Detection of early - stage Parkinson's disease**: Traditional Parkinson's disease diagnosis is usually carried out after patients show motor symptoms, at which time more than 60% of dopaminergic neurons in the brain have been lost. Therefore, early detection is crucial for formulating the best treatment plan. 2. **Effective use of EEG data**: EEG is a non - invasive and relatively low - cost method that can record the electrical activity of neurons in the cerebral cortex. However, how to extract useful information from noisy EEG signals has always been a research difficulty. 3. **Interpretability of the model**: Although existing classification models have high performance, they often lack interpretability, which limits their value in practical medical applications. The model proposed in this paper not only has high accuracy and recall rate, but also can provide visualization of feature importance, which is helpful for understanding the classification results. ### Method overview: - **Data representation**: Convert EEG signals into a sequence of 15 - variable band - power and peak - frequency values/coefficients. - **Feature extraction**: Extract statistical features from the above representation as input for machine - learning models. - **Classification model**: Use decision - tree and AdaBoost classifiers for classification and evaluate model performance through cross - validation. - **Importance analysis**: Determine the importance of different brain regions and EEG data types in classification through the framework. ### Main contributions: 1. **High - performance classification pipeline**: Propose an interpretable classification pipeline that achieves an accuracy and recall rate of over 80%, significantly outperforming the baseline model. 2. **Importance of N1 sleep data**: Discover that the data of N1 sleep stage has significant predictive ability in the classification of early - stage Parkinson's disease. 3. **Interpretability of the model**: Provide in - depth understanding of classification results through feature - importance analysis. ### Future work directions: 1. **Improve the interpretability of the model**: Further optimize feature engineering and add easily interpretable features, such as sleep spindle - wave density. 2. **Utilization of global brain features**: Combine features of different brain regions and brain - connectivity indicators to optimize model performance. 3. **Expand to other datasets**: Apply the classification pipeline to other EEG datasets to verify the generalization ability and feature importance of the model. In conclusion, this paper successfully addresses the challenges of early - stage Parkinson's disease detection by proposing a new EEG data - representation method and a classification pipeline, and provides valuable references for future clinical applications.