End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network

Yina Guo,Ting Liu,Xiaofei Zhang,Anhong Wang,Wenwu Wang
DOI: https://doi.org/10.1016/j.knosys.2023.110837
IF: 8.139
2023-07-27
Knowledge-Based Systems
Abstract:In a recent study of auditory evoked potential (AEP) based brain-computer interface (BCI), it was shown that, with an encoder–decoder framework, it is possible to translate human neural activity to speech (T-CAS). Current encoder–decoder-based methods achieve T-CAS often with a two-step approach where the information is passed between the encoder and decoder with a shared vector of reduced dimension, which, however, may result in information loss. In this paper, we propose an end-to-end model to translate human neural activity to speech (ET-CAS) by introducing a dual-dual generative adversarial network (Dual-DualGAN) for cross-domain mapping between electroencephalogram (EEG) and speech signals. In this model, we bridge the EEG and speech signals by introducing transition signals which are obtained by cascading the corresponding EEG and speech signals in a certain proportion. We then learn the mappings between the speech/EEG signals and the transition signals. We also develop a new EEG dataset where the attention of the participants is detected before the EEG signals are recorded to ensure that the participants have good attention in listening to speech utterances. The proposed method can translate word-length and sentence-length sequences of neural activity to speech. Experimental results show that the proposed method significantly outperforms state-of-the-art methods on both words and sentences of auditory stimulus.
computer science, artificial intelligence
What problem does this paper attempt to address?