Abstract:The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders aimed to solve the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.

What problem does this paper attempt to address?

The problem this paper attempts to address is how to use deep neural networks (DNN) to decode electroencephalogram (EEG) signals under continuous speech stimulation, specifically how to identify and separate the contributions of the auditory system to speech envelope and frequency-following responses (speech-FFRs). Specifically, the paper focuses on the "match-mismatch" task, which aims to determine which of two candidate speech segments aligns or matches temporally with a short-time multi-channel EEG recording. ### Background and Motivation - **Understanding the Auditory System**: The process by which human listeners perceive and understand spoken language is not fully understood. These processes need to be fast enough to enable real-time comprehension and robust enough to maintain performance under adverse auditory conditions. - **Non-invasive Monitoring**: EEG provides a non-invasive means to monitor the activity of the auditory system during continuous speech perception. Reliable auditory-EEG decoders could be used for objective diagnosis of hearing impairments or applied in cognitively guided hearing aids. - **Challenges**: EEG signals are very noisy and are affected by various factors such as heart, eye, and muscle activities, as well as external electromagnetic fields. Therefore, identifying and separating auditory-related components from the recorded signals is a significant challenge. ### Specific Problem - **Match-Mismatch Task**: Given a short-time EEG recording and two candidate speech segments, determine which speech segment aligns or matches temporally with the EEG segment. - **Feature Extraction**: Utilize cortical responses to speech envelopes and speech-related frequency-following responses (speech-FFRs) to associate EEG recordings with speech stimuli. - **Generalization Ability**: Evaluate the decoder's performance on unseen datasets, particularly its generalization ability under different auditory conditions. ### Methods and Techniques - **Datasets**: The SparrKULee dataset provided by the ICASSP 2023 Auditory EEG Decoding Signal Processing Grand Challenge (SPGC) and the ICL dataset were used. - **Preprocessing**: Preprocessing of EEG and speech signals, including removing slow drifts, noise suppression, and resampling. - **Deep Neural Networks**: Designed a DNN-based architecture that includes an EEG module and a stimulus module, using convolutional layers and cosine similarity operations to achieve classification for the match-mismatch task. ### Objectives - **Improve Decoding Accuracy**: Improve the classification accuracy of the match-mismatch task by optimizing the network structure and training methods. - **Generalization Ability Evaluation**: Evaluate the decoder's performance on unseen datasets, particularly its generalization ability under different auditory conditions. - **Application Prospects**: Explore the potential applications of the decoder in objective diagnosis of hearing impairments and cognitively guided hearing aids. In summary, this paper aims to improve the decoding capability of EEG signals under continuous speech stimulation using deep learning techniques, particularly in the match-mismatch task, and to evaluate its generalization ability on different datasets.

Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks

Decoding auditory attention (in real time) with eeg

[The Neural Encoding of Continuous Speech - Recent Advances in EEG and MEG Studies].

Relating EEG recordings to speech using envelope tracking and the speech-FFR

Neural decoding of the speech envelope: Effects of intelligibility and spectral degradation

Robust decoding of the speech envelope from EEG recordings through deep neural networks

Continuous and discrete decoding of overt speech with scalp electroencephalography (EEG)

Detecting gamma-band responses to the speech envelope for the ICASSP 2024 Auditory EEG Decoding Signal Processing Grand Challenge

Continuous and discrete decoding of overt speech with electroencephalography

Speech decoding from stereo-electroencephalography (sEEG) signals using advanced deep learning methods

Comparison of linear and nonlinear methods for decoding selective attention to speech from ear-EEG recordings

Delineating neural contributions to electroencephalogram-based speech decoding

Deep learning-based auditory attention decoding in listeners with hearing impairment

Toward Fully-End-to-End Listened Speech Decoding from EEG Signals

Decoding speech perception from non-invasive brain recordings

Decoding imagined speech with delay differential analysis

Decoding speech from non-invasive brain recordings

A neural speech decoding framework leveraging deep learning and speech synthesis

Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses

ADT Network: A Novel Nonlinear Method for Decoding Speech Envelopes From EEG Signals

EEG-based auditory attention decoding using speech-level-based segmented computational models