Abstract:Objective. Smart hearing aids which can decode the focus of a user's attention could considerably improve comprehension levels in noisy environments. Methods for decoding auditory attention from electroencephalography (EEG) have attracted considerable interest for this reason. Recent studies suggest that the integration of deep neural networks (DNNs) into existing auditory attention decoding algorithms is highly beneficial, although it remains unclear whether these enhanced algorithms can perform robustly in different real-world scenarios. To this end, we sought to characterise the performance of DNNs at reconstructing the envelope of an attended speech stream from EEG recordings in different listening conditions. In addition, given the relatively sparse availability of EEG data, we investigate possibility of applying subject-independent algorithms to EEG recorded from unseen individuals. Approach. Both linear models and nonlinear DNNs were employed to decode the envelope of clean speech from EEG recordings, with and without subject-specific information. The mean behaviour, as well as the variability of the reconstruction, was characterised for each model. We then trained subject-specific linear models and DNNs to reconstruct the envelope of speech in clean and noisy conditions, and investigated how well they performed in different listening scenarios. We also established that these models can be used to decode auditory attention in competing-speaker scenarios. Main results. The DNNs offered a considerable advantage over their linear counterpart at reconstructing the envelope of clean speech. This advantage persisted even when subject-specific information was unavailable at the time of training. The same DNN architectures generalised to a distinct dataset, which contained EEG recorded under a variety of listening conditions. In competing-speakers and speech-in-noise conditions, the DNNs significantly outperformed the linear models. Finally, the DNNs offered a considerable improvement over the linear approach at decoding auditory attention in competing-speakers scenarios. Significance. We present the first detailed study into the extent to which DNNs can be employed for reconstructing the envelope of an attended speech stream. We conclusively demonstrate that DNNs have the ability to improve the reconstruction of the attended speech envelope. The variance of the reconstruction error is shown to be similar for both DNNs and the linear model. Overall, DNNs are demonstrated to show promise for real-world auditory attention decoding, since they perform well in multiple listening conditions and generalise to data recorded from unseen participants.

Decoding speech envelopes from electroencephalographic recordings: A comparison of regularized linear regression and long short-term memory deep neural network

Robust Cortical Entrainment to the Speech Envelope Relies on the Spectro-Temporal Fine Structure

Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM.

ADT Network: A Novel Nonlinear Method for Decoding Speech Envelopes From EEG Signals

Neural decoding of the speech envelope: Effects of intelligibility and spectral degradation

Robust decoding of the speech envelope from EEG recordings through deep neural networks

Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks

Decoding of the speech envelope from EEG using the VLAAI deep neural network

Auditory attention decoding from electroencephalography based on long short-term memory networks

Auditory attention decoding from EEG-based Mandarin speech envelope reconstruction

Speech-reception-threshold estimation via EEG-based continuous speech envelope reconstruction

Vowel speech recognition from rat electroencephalography using long short-term memory neural network.

Envelope reconstruction of speech and music highlights stronger tracking of speech at low frequencies

Characterizing Neural Entrainment to Hierarchical Linguistic Units using Electroencephalography (EEG).

Sea-Wave: Speech envelope reconstruction from auditory EEG with an adapted WaveNet

EEG-based auditory attention decoding using speech-level-based segmented computational models

Envelope reconstruction of speech and music highlights unique tracking of speech at low frequencies

Relating the fundamental frequency of speech with EEG using a dilated convolutional network

Relating EEG recordings to speech using envelope tracking and the speech-FFR

Continuous speech with pauses inserted between words increases cortical tracking of speech envelope

Delineating neural contributions to electroencephalogram-based speech decoding