Abstract:Human listeners can reliably recognize speech in complex listening environments. The underlying neural mechanisms, however, remain unclear and cannot yet be emulated by any artificial system. In this dissertation, we study how speech is represented in the human auditory cortex and how the neural representation contributes to reliable speech recognition. Cortical activity from normal hearing human subjects is noninvasively recorded using magnetoencephalography, during natural speech listening. It is first demonstrated that neural activity from auditory cortex is precisely synchronized to the slow temporal modulations of speech, when the speech signal is presented in a quiet listening environment. How this neural representation is affected by acoustic interference is then investigated. Acoustic interference degrades speech perception via two mechanisms, informational masking and energetic masking, which are addressed respectively by using a competing speech stream and a stationary noise as the interfering sound. When two speech streams are presented simultaneously, cortical activity is predominantly synchronized to the speech stream the listener attends to, even if the unattended, competing speech stream is 8 dB more intense. When speech is presented together with spectrally matched stationary noise, cortical activity remains precisely synchronized to the temporal modulations of speech until the noise is 9 dB more intense. Critically, the accuracy of neural synchronization to speech predicts how well individual listeners can understand speech in noise. Further analysis reveals that two neural sources contribute to speech synchronized cortical activity, one with a shorter response latency of about 50 ms and the other with a longer response latency of about 100 ms. The longer-latency component, but not the shorter-latency component, shows selectivity to the attended speech and invariance to background noise, indicating a transition from encoding the acoustic scene to encoding the behaviorally important auditory object, in auditory cortex. Taken together, we have demonstrated that during natural speech comprehension, neural activity in the human auditory cortex is precisely synchronized to the slow temporal modulations of speech. This neural synchronization is robust to acoustic interference, whether speech or noise, and therefore provides a strong candidate for the neural basis of acoustic background invariant speech recognition.

Interference of mid-level sound statistics underlie human speech recognition sensitivity in natural noise

Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech

How Noise and Language Proficiency Influence Speech Recognition by Individual Non-Native Listeners.

Temporal coding of speech in human auditory cortex

Automatic Auditory Streaming Restores Missing Temporal Modulations in Echoic Speech

Understanding Requires Tracking: Noise and Knowledge Interact in Bilingual Comprehension

Robust cortical encoding of slow temporal modulations of speech.

Minimal background noise enhances neural speech tracking: Evidence of stochastic resonance

A Large-Scale Study of the Relationship Between Degree and Type of Hearing Loss and Recognition of Speech in Quiet and Noise

Explainable machine learning reveals the relationship between hearing thresholds and speech-in-noise recognition in listeners with normal audiograms

Individual Differences in Cognition and Perception Predict Neural Processing of Speech in Noise for Audiometrically Normal Listeners.

Successes and critical failures of neural networks in capturing human-like speech recognition

Auditory and language contributions to neural encoding of speech features in noisy environments

Multisensory benefits for speech recognition in noisy environments

Leading and Following: Noise Differently Affects Semantic and Acoustic Processing During Naturalistic Speech Comprehension

On the similarities of representations in artificial and brain neural networks for speech recognition

Correlation Between Audio–visual Enhancement of Speech in Different Noise Environments and SNR: A Combined Behavioral and Electrophysiological Study

Modelling human speech recognition in challenging noise maskers using machine learning

Estimating the contribution of central noise from composite performance across multiple tasks

Predicting speech-in-noise ability with static and dynamic auditory figure-ground analysis using structural equation modelling

A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise