Abstract:Damage or degeneration of motor pathways necessary for speech and other movements, as in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient communication without affecting brain structures responsible for language or cognition. In the worst-case scenario, this can result in the locked in syndrome (LIS), a condition in which individuals cannot initiate communication and can only express themselves by answering yes/no questions with eye blinks or other rudimentary movements. Existing augmentative and alternative communication (AAC) devices that rely on eye tracking can improve the quality of life for people with this condition, but brain-computer interfaces (BCIs) are also increasingly being investigated as AAC devices, particularly when eye tracking is too slow or unreliable. Moreover, with recent and ongoing advances in machine learning and neural recording technologies, BCIs may offer the only means to go beyond cursor control and text generation on a computer, to allow real-time synthesis of speech, which would arguably offer the most efficient and expressive channel for communication. The potential for BCI speech synthesis has only recently been realized because of seminal studies of the neuroanatomical and neurophysiological underpinnings of speech production using intracranial electrocorticographic (ECoG) recordings in patients undergoing epilepsy surgery. These studies have shown that cortical areas responsible for vocalization and articulation are distributed over a large area of ventral sensorimotor cortex, and that it is possible to decode speech and reconstruct its acoustics from ECoG if these areas are recorded with sufficiently dense and comprehensive electrode arrays. In this article, we review these advances, including the latest neural decoding strategies that range from deep learning models to the direct concatenation of speech units. We also discuss state-of-the-art vocoders that are integral in constructing natural-sounding audio waveforms for speech BCIs. Finally, this review outlines some of the challenges ahead in directly synthesizing speech for patients with LIS.

Acoustic inspired brain-to-sentence decoder for logosyllabic language

Silent Speech Decoding Using Spectrogram Features Based on Neuromuscular Activities

Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM.

Speech decoding using cortical and subcortical electrophysiological signals

Decoding Chinese phonemes from intracortical brain signals with hyperbolic-space neural representations

A brain-to-text framework for decoding natural tonal sentences

Decoding Bilingual EEG Signals With Complex Semantics Using Adaptive Graph Attention Convolutional Network

Decoding Linguistic Representations of Human Brain

Reconstructing Multi-Stroke Characters from Brain Signals Toward Generalizable Handwriting Brain-Computer Interfaces

Silent EEG Classification Using Cross-Fusion Adaptive Graph Convolution Network for Multilingual Neurolinguistic Signal Decoding.

Decoding and synthesizing tonal language speech from brain activity

Neural Decoding of Chinese Sign Language With Machine Learning for Brain-Computer Interfaces

A neural speech decoding framework leveraging deep learning and speech synthesis

Recognizing Tonal and Non-Tonal Mandarin Sentences for EEG-based Brain-Computer Interface

Recognizing Tonal and Nontonal Mandarin Sentences for EEG-Based Brain–Computer Interface

Effective Phoneme Decoding With Hyperbolic Neural Networks for High-Performance Speech BCIs

NeuSpeech: Decode Neural signal as Speech

Brain2Char: A Deep Architecture for Decoding Text from Brain Recordings

Speech decoding from stereo-electroencephalography (sEEG) signals using advanced deep learning methods

Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication

Decoding Continuous Character-based Language from Non-invasive Brain Recordings