Cortical tracking of speakers' spectral changes predicts selective listening

Francisco Cervantes Constantino,Angel A. Caputi
DOI: https://doi.org/10.1101/2024.05.23.595545
2024-10-20
Abstract:A social scene is particularly informative when people are distinguishable. To understand somebody amid a 'cocktail party' chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain's cortex. We investigate single-trial neural tracking of slow frequency modulations in speech using electroencephalography. Participants briefly listened to unfamiliar single speakers, and in addition, they performed a cocktail party comprehension task. Quantified through stimulus reconstruction methods, robust tracking was found in neural responses to slow (delta-theta range) modulations of frequency contours in the fourth and fifth formant band, equivalent to the 3.5-5 KHz audible range. Instantaneous frequency spacing (DF), which also yields indexical information from the vocal tract, was similarly decodable. Moreover, EEG evidence of listeners' spectral tracking abilities predicted their chances of succeeding at selective listening when faced with two-speaker speech mixtures. In summary, the results indicate that the communicating brain can rely on locking of cortical rhythms to major changes led by upper resonances of the vocal tract. Their corresponding articulatory mechanics hence continuously issue a fundamental credential for listeners to target in real time.
Neuroscience
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how the cerebral cortex achieves selective listening by tracking changes in the speaker's voice spectrum in noisy social scenarios. Specifically, the research focuses on how the brain processes and distinguishes the voice characteristics of different speakers. Especially in the "cocktail party" scenario, that is, when multiple speakers are talking simultaneously, how can the listener focus on the voice of a particular speaker. By analyzing the neural responses in a single trial, the paper explores the cerebral cortex's ability to track slow - frequency modulations in the voice signal (especially in the fourth and fifth formant bands, that is, in the range of 3.5 - 5 kHz), and further verifies whether this tracking ability can predict the listener's selective listening performance when facing multi - speaker mixed voices. The main contribution of the research lies in revealing that the cerebral cortex realizes the real - time tracking of the speaker's voice characteristics by locking on the main changes in the high - formant range, thereby helping the listener to recognize and understand the speech content of a specific speaker in a complex environment. This finding is of great significance for understanding the human language processing mechanism in noisy environments.