Abstract:Identifying auditory attention by comparing auditory stimuli and corresponding brain responses, is known as auditory attention decoding (AAD). The majority of AAD algorithms utilize the so-called envelope entrainment mechanism, whereby auditory attention is identified by how the envelope of the auditory stream drives variation in the electroencephalography (EEG) signal. However, neural processing can also be decoded based on endogenous cognitive responses, in this case, neural responses evoked by attention to specific words in a speech stream. This approach is largely unexplored in the field of AAD but leads to a single-word auditory attention decoding problem in which an epoch of an EEG signal timed to a specific word is labeled as attended or unattended. This paper presents a deep learning approach, based on EEGNet, to address this challenge. We conducted a subject-independent evaluation on an event-based AAD dataset with three different paradigms: word category oddball, word category with competing speakers, and competing speech streams with targets. The results demonstrate that the adapted model is capable of exploiting cognitive-related spatiotemporal EEG features and achieving at least 58% accuracy on the most realistic competing paradigm for the unseen subjects. To our knowledge, this is the first study dealing with this problem.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to decode Auditory Attention Decoding (AAD) through electroencephalogram (EEG) signals of single words in a multi - speaker environment. Specifically, the researchers explored deep - learning - based methods, especially using the EEGNet architecture, to identify whether a particular word was noticed by the subjects. This method is different from the traditional method based on audio envelope reconstruction, which mainly relies on neural responses caused by external stimuli. The method proposed in this paper focuses on endogenous cognitive responses, that is, event - related potentials (ERP) triggered by specific words in a multi - speaker environment, so as to achieve the classification of auditory attention of single words. ### Main Challenges: 1. **Small - scale and Imbalanced Dataset**: Due to the particularity of the experimental design, the dataset is relatively small and imbalanced (at a ratio of approximately 1:5), which poses challenges to model training. 2. **Generalization Ability of the Model**: The model is required to perform well on unseen subjects, especially in cross - paradigm situations. ### Solutions: 1. **Data Augmentation**: To overcome the problems of small and imbalanced datasets, the researchers proposed two data augmentation methods: - **Average Up - sampling**: New samples are generated by averaging random samples of each category to increase the amount and diversity of data. - **ERP Simulation**: New target samples are generated by adding target (noticed) ERP waveforms to non - target (unnoticed) samples, introducing more variability. 2. **Model Selection**: The lightweight EEGNet architecture was selected. This architecture can effectively extract spatial and temporal features and has fewer parameters, which is suitable for processing limited datasets. ### Experimental Results: - **Subject - Pool Performance**: In all three paradigms, the model using data augmentation significantly outperforms the model without data augmentation. Especially in Paradigm 1 and Paradigm 2, the paradigm - specific models perform better than the paradigm - independent models. - **Leave - One - Out Validation**: For unseen subjects, the model using data augmentation still performs well, although the performance drops slightly. This indicates that the model has a certain generalization ability. ### Conclusion: The research proves that through deep - learning methods, especially the EEGNet architecture, auditory attention can be effectively decoded from EEG signals of single words. The data augmentation strategy is crucial for improving model performance, especially in cases where the dataset is small and imbalanced. Future research can further explore how to combine endogenous and exogenous responses to improve the robustness and generalization ability of the model.

Single-word Auditory Attention Decoding Using Deep Learning Model

Decoding auditory attention (in real time) with eeg

AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding

Deep learning-based auditory attention decoding in listeners with hearing impairment

Auditory Attention Decoding from EEG Using Convolutional Recurrent Neural Network

EEG-Based Short-Time Auditory Attention Detection Using Multi-Task Deep Learning.

Auditory attention decoding from electroencephalography based on long short-term memory networks

Detecting the Locus of Auditory Attention Based on the Spectro-Spatial-temporal Analysis of EEG.

Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning

Using Ear-EEG to Decode Auditory Attention in Multiple-speaker Environment

EEG-based auditory attention decoding using speech-level-based segmented computational models

Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods

A Neural-Inspired Architecture for EEG-Based Auditory Attention Detection

Investigating Self-Supervised Deep Representations for EEG-based Auditory Attention Decoding

A DenseNet-based method for decoding auditory spatial attention with EEG

EEG-Based Auditory Attention Detection via Frequency and Channel Neural Attention

Decoding Selective Auditory Attention with EEG Using a Transformer Model

Deep Neural Networks on EEG Signals to Predict Auditory Attention Score Using Gramian Angular Difference Field

Auditory Attention Detection via Cross-Modal Attention

SADNet: Sustained Attention Decoding in a Driving Task by Self-Attention Convolutional Neural Network

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation