Abstract:Objective. Smart hearing aids which can decode the focus of a user's attention could considerably improve comprehension levels in noisy environments. Methods for decoding auditory attention from electroencephalography (EEG) have attracted considerable interest for this reason. Recent studies suggest that the integration of deep neural networks (DNNs) into existing auditory attention decoding algorithms is highly beneficial, although it remains unclear whether these enhanced algorithms can perform robustly in different real-world scenarios. To this end, we sought to characterise the performance of DNNs at reconstructing the envelope of an attended speech stream from EEG recordings in different listening conditions. In addition, given the relatively sparse availability of EEG data, we investigate possibility of applying subject-independent algorithms to EEG recorded from unseen individuals. Approach. Both linear models and nonlinear DNNs were employed to decode the envelope of clean speech from EEG recordings, with and without subject-specific information. The mean behaviour, as well as the variability of the reconstruction, was characterised for each model. We then trained subject-specific linear models and DNNs to reconstruct the envelope of speech in clean and noisy conditions, and investigated how well they performed in different listening scenarios. We also established that these models can be used to decode auditory attention in competing-speaker scenarios. Main results. The DNNs offered a considerable advantage over their linear counterpart at reconstructing the envelope of clean speech. This advantage persisted even when subject-specific information was unavailable at the time of training. The same DNN architectures generalised to a distinct dataset, which contained EEG recorded under a variety of listening conditions. In competing-speakers and speech-in-noise conditions, the DNNs significantly outperformed the linear models. Finally, the DNNs offered a considerable improvement over the linear approach at decoding auditory attention in competing-speakers scenarios. Significance. We present the first detailed study into the extent to which DNNs can be employed for reconstructing the envelope of an attended speech stream. We conclusively demonstrate that DNNs have the ability to improve the reconstruction of the attended speech envelope. The variance of the reconstruction error is shown to be similar for both DNNs and the linear model. Overall, DNNs are demonstrated to show promise for real-world auditory attention decoding, since they perform well in multiple listening conditions and generalise to data recorded from unseen participants.

EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses

Identification of Attended Speech Stream Using Single-Trial Electroencephalography Recording

EEG-based Auditory Attention Decoding: Towards Neuro-Steered Hearing Devices

NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals

Linear versus deep learning methods for noisy speech separation for EEG-informed attention decoding

EEG-Derived Voice Signature for Attended Speaker Detection

NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

Neural decoding of attentional selection in multi-speaker environments without access to clean sources

Real-time control of a hearing instrument with EEG-based attention decoding

Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception

Comparison of linear and nonlinear methods for decoding selective attention to speech from ear-EEG recordings

EEG-based detection of the locus of auditory attention with convolutional neural networks

Brain-controlled augmented hearing for spatially moving conversations in multi-talker environments

EEG decoding of the target speaker in a cocktail party scenario: considerations regarding dynamic switching of talker location

Using Ear-EEG to Decode Auditory Attention in Multiple-speaker Environment

Multimodal Speech Recognition Using EEG and Audio Signals: A Novel Approach for Enhancing ASR Systems

EEG-assisted Modulation of Sound Sources in the Auditory Scene

EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid

Robust decoding of the speech envelope from EEG recordings through deep neural networks

Real-Time Tracking of Selective Auditory Attention From M/EEG: A Bayesian Filtering Approach