Abstract:Objective: This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements and the trained model should perform well on participants unseen during training. Approach: We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes, by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train both subject-specific models using data from a single participant as well as multi-patient models exploiting data from multiple participants. Main Results: The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. The multi- subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation. Significance: The proposed SwinTW decoder enables future speech neuropros- theses to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. Importantly, the generalizability of the multi-patient models suggests the exciting possibility of developing speech neuropros- theses for people with speech disability without relying on their own neural data for training, which is not always feasible.

Employing Deep Learning Model to Evaluate Speech Information in Acoustic Simulations of Auditory Implants

Employing deep learning model to evaluate speech information in acoustic simulations of Cochlear implants

Validation Of Acoustic Models Of Auditory Neural Prostheses

DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs

The Realization of Acoustic Model Based on Noise Modulating for Cochlear Implantation

Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

A model of speech recognition for hearing-impaired listeners based on deep learning

Multimodal Speech Recognition Using EEG and Audio Signals: A Novel Approach for Enhancing ASR Systems

Real-time Synthesis of Imagined Speech Processes from Minimally Invasive Recordings of Neural Activity

Neural Speech Decoding During Audition, Imagination and Production

Towards reconstructing intelligible speech from the human auditory cortex

Deep learning restores speech intelligibility in multi-talker interference for cochlear implant users

Predicting speech intelligibility from EEG in a non-linear classification paradigm

A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

Using Automatic Speech Recognition to Measure the Intelligibility of Speech Synthesized from Brain Signals

Audio-Visual Speech Enhancement Using Self-supervised Learning to Improve Speech Intelligibility in Cochlear Implant Simulations

Toward Assessment of Human Voice Biomarkers of Brain Lesions Through Explainable Deep Learning.

Subject-Agnostic Transformer-Based Neural Speech Decoding from Surface and Depth Electrode Signals

How to Train Your Ears: Auditory-Model Emulation for Large-Dynamic-Range Inputs and Mild-to-Severe Hearing Losses