Abstract:Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, it is found that on the recently presented dataset with 14-class directional focus, models relying exclusively on EEG inputs exhibit significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. The CNN, LSM-CNN, and Deformer models are employed to decode the directional focus from listeners' EEG signals and audio spatial spectra. The proposed Sp-EEG-Deformer model achieves notable 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios with a decision window of 1 second, respectively. Experiment results indicate increased decoding accuracy as the number of alternative directions reduces. These findings suggest the efficacy of our proposed dual modal directional focus decoding strategy.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the accuracy of decoding the direction of the attended speaker from the listener's electroencephalogram (EEG) signals, especially in multi - class direction decoding tasks. Specifically, most of the existing methods focus on binary - class direction decoding (i.e., determining whether the attended speaker is on the left or right), and are unable to accurately decode the specific direction of attention. In addition, the audio spatial information is not fully utilized, resulting in sub - optimal decoding results. ### Summary of main problems: 1. **Extension from binary - class to multi - class decoding**: Previous studies mainly focused on binary - class direction decoding, that is, determining whether the attended speaker is on the left or right. However, in practical applications, it is necessary to more accurately decode the specific direction of attention. 2. **Insufficient utilization of audio spatial information**: Existing models fail to effectively utilize the spatial information of audio, resulting in limited decoding performance. 3. **Impact of cross - validation paradigms**: Previous studies may have overestimated the decoding accuracy due to the presence of trial - specific features, so more stringent cross - validation methods are required to evaluate model performance. ### Core contributions of the paper: - Propose a new database containing 14 different speaker directions for multi - class direction decoding tasks. - Combine the audio spatial spectrum with EEG features and enhance the decoding performance by introducing a fusion module. - Evaluate the model performance in the more challenging leave - one - out cross - validation (LOO - CV) scenarios, including leave - one - trial - out (LOTO) and leave - one - subject - out (LOSO). - Use multiple deep - learning models (such as CNN, LSM - CNN, Deformer) for experiments and propose a new Sp - EEG - Deformer model, which achieves significant performance improvement in the 14 - class direction decoding task. ### Solutions: By combining EEG signals and audio spatial spectra and using an improved convolutional neural network (CNN), a learnable spatial mapping (LSM) module, and a Deformer model, the authors propose a bimodal direction decoding strategy to improve the accuracy of multi - class direction decoding. ### Experimental results: The experimental results show that as the number of alternative directions decreases, the decoding accuracy improves. In particular, in the leave - one - trial - out and leave - one - subject - out scenarios, the proposed Sp - EEG - Deformer model achieves 14 - class decoding accuracies of 55.35% and 57.19% respectively. These findings prove the effectiveness of the proposed bimodal direction decoding strategy and provide new ideas for the development of more advanced brain - computer interfaces.

Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

Electroencephalogram-based Multiclass Auditory Attention Decoding of Attended Speaker Direction

Decoding auditory attention (in real time) with eeg

Identification of Attended Speech Stream Using Single-Trial Electroencephalography Recording

Super-Arc-Connected and Super-Connected Total Digraphs

Riemannian geometry-based decoding of the directional focus of auditory attention using EEG

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

Using Ear-EEG to Decode Auditory Attention in Multiple-speaker Environment

Auditory attention decoding from electroencephalography based on long short-term memory networks

EEG decoding of the target speaker in a cocktail party scenario: considerations regarding dynamic switching of talker location

Auditory Attention Decoding in Four-Talker Environment with EEG

Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model

Multimodal Speech Recognition Using EEG and Audio Signals: A Novel Approach for Enhancing ASR Systems

What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention

Morphometric Ablation Lesion Characteristics Comparing 4, 6, and 8 mm Electrode‐Tip Cryocatheters

Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals

Comparison of linear and nonlinear methods for decoding selective attention to speech from ear-EEG recordings

Period effects on fertility for parity cohorts, Egypt: 1965-1980.

Real-time control of a hearing instrument with EEG-based attention decoding

Deep learning-based auditory attention decoding in listeners with hearing impairment

Congruent audiovisual speech enhances auditory attention decoding with EEG.