Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals

Jung-Sun Lee,Ha-Na Jo,Seo-Hyun Lee
2024-11-14
Abstract:Brain signals accompany various information relevant to human actions and mental imagery, making them crucial to interpreting and understanding human intentions. Brain-computer interface technology leverages this brain activity to generate external commands for controlling the environment, offering critical advantages to individuals with paralysis or locked-in syndrome. Within the brain-computer interface domain, brain-to-speech research has gained attention, focusing on the direct synthesis of audible speech from brain signals. Most current studies decode speech from brain activity using invasive techniques and emphasize spoken speech data. However, humans express various speech states, and distinguishing these states through non-invasive approaches remains a significant yet challenging task. This research investigated the effectiveness of deep learning models for non-invasive-based neural signal decoding, with an emphasis on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech, across multiple frequency bands. The model utilizing the spatial conventional neural network module demonstrated superior performance compared to other models, especially in the gamma band. Additionally, imagined speech in the theta frequency band, where deep learning also showed strong effects, exhibited statistically significant differences compared to the other speech paradigms.
Artificial Intelligence,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to decode the perceived, spoken, whispered and imagined speech states from electroencephalogram (EEG) signals by non - invasive methods using deep - learning models. Specifically, the researchers focus on how to distinguish these different speech paradigms and evaluate the performance of the model on multiple frequency bands (such as delta, theta, alpha, beta and gamma bands). The main contributions of the paper are as follows: 1. **Non - invasive decoding**: Most existing studies rely on invasive techniques to decode speech, while this study attempts to use non - invasive methods, which can significantly reduce the potential risks and discomfort to subjects. 2. **Multi - paradigm distinction**: It not only focuses on spoken speech, but also covers perceived, whispered and imagined speech states, which enables the model to understand different types of speech activities more comprehensively. 3. **Deep - learning model**: Use deep - learning models, especially the Spatial CNN (Convolutional Neural Network) module, to directly extract features from the original EEG data, thereby improving the accuracy and robustness of decoding. 4. **Frequency - band analysis**: Analyze on different frequency bands and find that some frequency bands (such as the gamma band) show better performance when decoding specific speech paradigms. Through these methods, the researchers hope to develop a more effective brain - computer interface (BCI) system and provide a new communication method for those who are unable to communicate normally due to paralysis or locked - in syndrome.