Decoding speech perception from non-invasive brain recordings

Alexandre Défossez,Charlotte Caucheteux,Jérémy Rapin,Ori Kabeli,Jean-Rémi King

DOI: https://doi.org/10.1038/s42256-023-00714-5

2023-10-05

Abstract:Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major milestones in that regard: deep learning algorithms trained on intracranial recordings now start to decode elementary linguistic features (e.g. letters, words, spectrograms). However, extending this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from the non-invasive recordings of a large cohort of healthy individuals. To evaluate this approach, we curate and integrate four public datasets, encompassing 175 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to short stories and isolated sentences. The results show that our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities on average across participants, and more than 80% in the very best participants - a performance that allows the decoding of words and phrases absent from the training set. The comparison of our model to a variety of baselines highlights the importance of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across multiple participants. Finally, the analysis of the decoder's predictions suggests that they primarily depend on lexical and contextual semantic representations. Overall, this effective decoding of perceived speech from non-invasive recordings delineates a promising path to decode language from brain activity, without putting patients at risk for brain surgery.

Audio and Speech Processing,Artificial Intelligence,Machine Learning,Neurons and Cognition

What problem does this paper attempt to address?

This paper aims to address the problem of decoding speech perception from non-invasive brain recordings. Specifically, the goal of the research is to decode the brain's perception of speech from non-invasive brain recordings (such as MEG and EEG) of healthy individuals without performing invasive surgery. Currently, most methods rely on invasive devices to achieve high-precision speech decoding, but these methods require brain surgery and are difficult to maintain signal quality over the long term. Therefore, this study proposes a model based on contrastive learning training, which extracts deep representations from large-scale speech data through self-supervised learning and applies them to non-invasive brain recordings of healthy volunteers to identify speech segments perceived auditorily. The researchers integrated four publicly available datasets, containing MEG and EEG recordings of 175 participants while listening to stories or isolated sentences. Experimental results show that the model can identify corresponding speech segments from 3-second MEG signals with an accuracy of up to 41%, and in the best participants, this accuracy even exceeds 80%. Additionally, the model can decode words and phrases that did not appear in the training set. The study also highlights the importance of contrastive learning objectives, pre-trained speech representations, and convolutional architectures trained across multiple participants. Overall, this research demonstrates the potential for effectively decoding speech perception from non-invasive brain recordings, providing new insights for the future development of non-invasive brain-computer interfaces.

Decoding speech perception from non-invasive brain recordings

Decoding speech from non-invasive brain recordings

Speech decoding from stereo-electroencephalography (sEEG) signals using advanced deep learning methods

The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

NeuSpeech: Decode Neural signal as Speech

Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals

Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals

Semantic reconstruction of continuous language from non-invasive brain recordings

Continuous and discrete decoding of overt speech with scalp electroencephalography (EEG)

Brain decoding: toward real-time reconstruction of visual perception

Continuous and discrete decoding of overt speech with electroencephalography

Towards Naturalistic Speech Decoding from Intracranial Brain Data

Improving Speech Decoding from ECoG with Self-Supervised Pretraining

Decoding Continuous Character-based Language from Non-invasive Brain Recordings

Resolving Domain Shift For Representations Of Speech In Non-Invasive Brain Recordings

Towards Decoding Brain Activity During Passive Listening of Speech

Speech decoding from a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network

Speech decoding using cortical and subcortical electrophysiological signals

Unsupervised decoding of long-term, naturalistic human neural recordings with automated video and audio annotations

A multimodal LLM for the non-invasive decoding of spoken text from brain recordings