Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

Yuanming Zhang,Jing Lu,Fei Chen,Haoliang Du,Xia Gao,Zhibin Lin
2025-01-09
Abstract:Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, it is found that on the recently presented dataset with 14-class directional focus, models relying exclusively on EEG inputs exhibit significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. The CNN, LSM-CNN, and Deformer models are employed to decode the directional focus from listeners' EEG signals and audio spatial spectra. The proposed Sp-EEG-Deformer model achieves notable 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios with a decision window of 1 second, respectively. Experiment results indicate increased decoding accuracy as the number of alternative directions reduces. These findings suggest the efficacy of our proposed dual modal directional focus decoding strategy.
Sound,Artificial Intelligence,Computation and Language,Audio and Speech Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the accuracy of decoding the direction of the attended speaker from the listener's electroencephalogram (EEG) signals, especially in multi - class direction decoding tasks. Specifically, most of the existing methods focus on binary - class direction decoding (i.e., determining whether the attended speaker is on the left or right), and are unable to accurately decode the specific direction of attention. In addition, the audio spatial information is not fully utilized, resulting in sub - optimal decoding results. ### Summary of main problems: 1. **Extension from binary - class to multi - class decoding**: Previous studies mainly focused on binary - class direction decoding, that is, determining whether the attended speaker is on the left or right. However, in practical applications, it is necessary to more accurately decode the specific direction of attention. 2. **Insufficient utilization of audio spatial information**: Existing models fail to effectively utilize the spatial information of audio, resulting in limited decoding performance. 3. **Impact of cross - validation paradigms**: Previous studies may have overestimated the decoding accuracy due to the presence of trial - specific features, so more stringent cross - validation methods are required to evaluate model performance. ### Core contributions of the paper: - Propose a new database containing 14 different speaker directions for multi - class direction decoding tasks. - Combine the audio spatial spectrum with EEG features and enhance the decoding performance by introducing a fusion module. - Evaluate the model performance in the more challenging leave - one - out cross - validation (LOO - CV) scenarios, including leave - one - trial - out (LOTO) and leave - one - subject - out (LOSO). - Use multiple deep - learning models (such as CNN, LSM - CNN, Deformer) for experiments and propose a new Sp - EEG - Deformer model, which achieves significant performance improvement in the 14 - class direction decoding task. ### Solutions: By combining EEG signals and audio spatial spectra and using an improved convolutional neural network (CNN), a learnable spatial mapping (LSM) module, and a Deformer model, the authors propose a bimodal direction decoding strategy to improve the accuracy of multi - class direction decoding. ### Experimental results: The experimental results show that as the number of alternative directions decreases, the decoding accuracy improves. In particular, in the leave - one - trial - out and leave - one - subject - out scenarios, the proposed Sp - EEG - Deformer model achieves 14 - class decoding accuracies of 55.35% and 57.19% respectively. These findings prove the effectiveness of the proposed bimodal direction decoding strategy and provide new ideas for the development of more advanced brain - computer interfaces.