Abstract:Using a continuous listening task, we evaluated the coupling between the listener's cortical activity and the temporal envelopes of different sounds in a multitalker auditory scene using magnetoencephalography and corticovocal coherence analysis. Neuromagnetic signals were recorded from 20 right-handed healthy adult humans who listened to five different recorded stories (attended speech streams), one without any multitalker background (No noise) and four mixed with a “cocktail party” multitalker background noise at four signal-to-noise ratios (5, 0, −5, and −10 dB) to produce speech-in-noise mixtures, here referred to as Global scene. Coherence analysis revealed that the modulations of the attended speech stream, presented without multitalker background, were coupled at ∼0.5 Hz to the activity of both superior temporal gyri, whereas the modulations at 4–8 Hz were coupled to the activity of the right supratemporal auditory cortex. In cocktail party conditions, with the multitalker background noise, the coupling was at both frequencies stronger for the attended speech stream than for the unattended Multitalker background. The coupling strengths decreased as the Multitalker background increased. During the cocktail party conditions, the ∼0.5 Hz coupling became left-hemisphere dominant, compared with bilateral coupling without the multitalker background, whereas the 4–8 Hz coupling remained right-hemisphere lateralized in both conditions. The brain activity was not coupled to the multitalker background or to its individual talkers. The results highlight the key role of listener's left superior temporal gyri in extracting the slow ∼0.5 Hz modulations, likely reflecting the attended speech stream within a multitalker auditory scene. SIGNIFICANCE STATEMENT When people listen to one person in a “cocktail party,” their auditory cortex mainly follows the attended speech stream rather than the entire auditory scene. However, how the brain extracts the attended speech stream from the whole auditory scene and how increasing background noise corrupts this process is still debated. In this magnetoencephalography study, subjects had to attend a speech stream with or without multitalker background noise. Results argue for frequency-dependent cortical tracking mechanisms for the attended speech stream. The left superior temporal gyrus tracked the ∼0.5 Hz modulations of the attended speech stream only when the speech was embedded in multitalker background, whereas the right supratemporal auditory cortex tracked 4–8 Hz modulations during both noiseless and cocktail-party conditions.

Temporal coherence shapes cortical responses to speech mixtures in a ferret cocktail party

Temporal Coherence Shapes Cortical Responses to Speech Mixtures in a Ferret Cocktail Party

Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “cocktail Party”

Neural coding of continuous speech in auditory cortex during monaural and dichotic listening.

Emergence of Neural Encoding of Auditory Objects While Listening to Competing Speakers

Prior Knowledge Guides Speech Segregation in Human Auditory Cortex.

Temporal coding of speech in human auditory cortex

Cortical Neural Coding of Speech in Simple and Complex Auditory Scenes.

Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech

Robust cortical encoding of slow temporal modulations of speech.

Temporal-Coherence Induces Binding of Responses to Sound Sequences in Ferret Auditory Cortex

Temporal-Coherence Induces Binding in Responses to Sound Sequences in Ferret Auditory Cortex

Temporal Coherence Sensitivity in Auditory Cortex

Human auditory cortex activity shows additive effects of spectral and spatial cues during speech segregation.

Cortical tracking of speakers' spectral changes predicts selective listening

Attention to audiovisual speech shapes neural processing through feedback-feedforward loops between different nodes of the speech network

Cortical Oscillations In Auditory Perception And Speech: Evidence For Two Temporal Windows In Human Auditory Cortex

Dissociable Neural Correlates of Multisensory Coherence and Selective Attention

Selective attention to audiovisual speech routes activity through recurrent feedback-feedforward loops between different nodes of the speech network

Joint population coding and temporal coherence link an attended talker's voice and location features in naturalistic multi-talker scenes.

Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene