Electroencephalogram-based Multi-class Decoding of Attended Speakers' Direction with Audio Spatial Spectrum

Yuanming Zhang,Jing Lu,Zhibin Lin,Fei Chen,Haoliang Du,Xia Gao
2024-11-11
Abstract:Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, we observe that, on our recently presented dataset with 15-class directional focus, models relying exclusively on EEG inputs exhibits significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. We employ the CNN, LSM-CNN, and EEG-Deformer models to decode the directional focus from listeners' EEG signals with the auxiliary audio spatial spectra. The proposed Sp-Aux-Deformer model achieves notable 15-class decoding accuracies of 57.48% and 61.83% in leave-one-subject-out and leave-one-trial-out scenarios, respectively.
Sound,Artificial Intelligence,Computation and Language,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the low decoding accuracy when only using electroencephalogram (EEG) signals to decode the direction of the speaker that the listener is paying attention to in multi - directional attention decoding. Specifically, most of the existing research has focused on binary - class directional attention decoding, that is, determining whether the speaker that the listener is paying attention to is on the left or on the right. However, in order to achieve more effective speech processing, it is necessary to more accurately decode the specific direction of the speaker that the listener is paying attention to. In addition, audio spatial information has not been effectively utilized in previous research, resulting in unsatisfactory decoding results. To solve these problems, the author proposes a new method to improve the decoding accuracy of multi - class directional attention by combining the spatial spectrum of audio with EEG features. The author uses convolutional neural network (CNN), LSM - CNN and EEG - Deformer models, and introduces a fusion module to integrate audio spatial information and EEG features. The experimental results show that the proposed Sp - Aux - Deformer model achieves 57.48% and 61.83% decoding accuracy of 15 - class directional attention in the leave - one - trial - out and leave - one - subject - out scenarios respectively.