Automatic Auditory Streaming Restores Missing Temporal Modulations in Echoic Speech

Jiaxin Gao,Mingxuan Fang,Honghua Chen,Nai Ding
DOI: https://doi.org/10.1101/2023.05.11.540309
2023-01-01
Abstract:Human listeners can reliably recognize speech in adverse listening environments, and previous studies have identified that reliable neural encoding of slow temporal modulations in speech is essential for speech recognition. Recent behavioral studies demonstrate that long-delay echoes, which are rare in physical environments but common during online conferencing, can eliminate critical temporal modulations. These echoes, however, barely affect speech intelligibility, and here we investigate the underlying neural mechanisms. MEG experiments demonstrate that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be explained by basic neural adaptation mechanisms such as synaptic depression, gain control, and adaptive filtering. Instead, the cortical response to echoic speech is better explained by a model that segregates speech from its echo than a model that encodes echoic speech as a whole. The speech segregation effect is observed even when attention is diverted, but disappears when speech segregation cues in the spectro-temporal fine structure are degraded. Altogether, these results strongly suggest that the auditory system can automatically segregate speech and its echo and encode them as two auditory streams, providing a potential neural basis for reliable speech recognition in echoic environments.### Competing Interest StatementThe authors have declared no competing interest.
What problem does this paper attempt to address?