Abstract:Objective: Brain-computer interface (BCI) can translate intentions directly into instructions and greatly improve the interaction experience for disabled people or some specific interactive applications. To improve the efficiency of BCI, the objective of this study is to explore the feasibility of an audio-assisted visual BCI speller and a deep learning-based single-trial event related potentials (ERP) decoding strategy. Approach: In this study, a two-stage BCI speller combining the motion-onset visual evoked potential (mVEP) and semantically congruent audio evoked ERP was designed to output the target characters. In the first stage, the different group of characters were presented in the different locations of visual field simultaneously and the stimuli were coded to the mVEP based on a new space division multiple access scheme. And then, the target character can be output based on the audio-assisted mVEP in the second stage. Meanwhile, a spatial-temporal attention-based convolutional neural network (STA-CNN) was proposed to recognize the single-trial ERP components. The CNN can learn 2-dimentional features including the spatial information of different activated channels and time dependence among ERP components. In addition, the STA mechanism can enhance the discriminative event-related features by adaptively learning probability weights. Main results: The performance of the proposed two-stage audio-assisted visual BCI paradigm and STA-CNN model was evaluated using the Electroencephalogram (EEG) recorded from 10 subjects. The average classification accuracy of proposed STA-CNN can reach 59.6 and 77.7% for the first and second stages, which were always significantly higher than those of the comparison methods ( p < 0.05). Significance: The proposed two-stage audio-assisted visual paradigm showed a great potential to be used to BCI speller. Moreover, through the analysis of the attention weights from time sequence and spatial topographies, it was proved that STA-CNN could effectively extract interpretable spatiotemporal EEG features.

End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network

Fully end-to-end EEG to speech translation using multi-scale optimized dual generative adversarial network with cycle-consistency loss

Multimodal Speech Recognition Using EEG and Audio Signals: A Novel Approach for Enhancing ASR Systems

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

A neural speech decoding framework leveraging deep learning and speech synthesis

Dual-TSST: A Dual-Branch Temporal-Spectral-Spatial Transformer Model for EEG Decoding

Common Spatial Generative Adversarial Networks based EEG Data Augmentation for Cross-Subject Brain-Computer Interface

A Multi-Scale Activity Transition Network for Data Translation in EEG Signals Decoding

A novel brain-computer interface based on audio-assisted visual evoked EEG and spatial-temporal attention CNN

E2SGAN: EEG-to-SEEG Translation with Generative Adversarial Networks

Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement

Geometric neural network based on phase space for BCI-EEG decoding

NeuSpeech: Decode Neural signal as Speech

An End-to-End EEG Channel Selection Method with Residual Gumbel Softmax for Brain-Assisted Speech Enhancement

Dual-Path Transformer-Based GAN for Co-speech Gesture Synthesis

Enhancing EEG Signal Generation through a Hybrid Approach Integrating Reinforcement Learning and Diffusion Models

Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder

Electroencephalographic Signal Data Augmentation Based on Improved Generative Adversarial Network

EEGGAN-Net: enhancing EEG signal classification through data augmentation

Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings

Spoken Speech Enhancement using EEG