Machine learning of brain-specific biomarkers from EEG

Philipp Bomatter,Joseph Paillard,Pilar Garces,Jörg Hipp,Denis Engemann
DOI: https://doi.org/10.1101/2023.12.15.571864
2024-01-10
Abstract:Electroencephalography (EEG) has a long history as a clinical tool to study brain function, and its potential to derive biomarkers for various applications is far from exhausted. Machine learning (ML) can guide future innovation by harnessing the wealth of complex EEG signals to isolate relevant brain activity. Yet, ML studies in EEG tend to ignore physiological artifacts, which may cause problems for deriving biomarkers specific to the central nervous system (CNS). We present a framework for conceptualizing machine learning from CNS versus peripheral signals measured with EEG. A common signal representation across the frequency spectrum based on Morlet wavelets allowed us to define traditional brain activity features (e.g. log power) and alternative inputs used by state-of-the-art ML approaches (covariance matrices). Using more than 2600 EEG recordings from large public databases (TUAB, TDBRAIN), we studied the impact of peripheral signals and artifact removal techniques on ML models in exemplary age and sex prediction analyses. Across benchmarks, basic artifact rejection improved model performance whereas further removal of peripheral signals using ICA decreased performance. Our analyses revealed that peripheral signals enable age and sex prediction. However, they explained only a fraction of the performance provided by brain signals. We show that brain signals and body signals, both reflected in the EEG, allow for prediction of personal characteristics. While these results may depend on specific prediction problems, our work suggests that great care is needed to separate these signals when the goal is to develop CNS-specific biomarkers using ML.
Neuroscience
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to use machine learning to extract biomarkers specific to the central nervous system (CNS) from electroencephalograms (EEGs). Specifically, the research focuses on how to deal with physiological artifact signals in machine - learning models, especially signals from other parts of the body (such as eye movements, muscle activities and cardiac activities). These signals may interfere with the process of extracting biomarkers from central nervous system signals. The researchers proposed a conceptual framework to distinguish between brain signals and peripheral signals, and tested the effects of different pre - processing techniques on the performance of machine - learning models through age and gender prediction analysis. In addition, the study also explored the potential value of non - brain signals in predicting personal characteristics and how these signals affect the predictive ability of the model. The core objectives of the paper include: 1. **Construct a conceptual framework for interpretable brain - specific EEG biomarkers**: Clearly consider the importance of physiological signals that may be predictive. 2. **Test whether machine - learning models will systematically use non - brain signals if EEG signals are not fully pre - processed**: Evaluate the effects of different pre - processing methods (such as automatic rejection and independent component analysis) on model performance, especially whether these methods help to improve or reduce the model's ability to recognize brain signals. Through these studies, the authors hope to provide guidance for the development of more accurate and more interpretable CNS - specific biomarkers, while emphasizing the importance of removing artifacts when processing EEG data.