Abstract:Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (illusion-fa), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (illusion-ba), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba, and a reduced N1 when they perceived illusion-fa, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex.SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).

On the Role of Noise in AudioVisual Integration: Evidence from Artificial Neural Networks that Exhibit the McGurk Effect

Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech

Prediction across sensory modalities: A neurocomputational model of the McGurk effect

Neural Mechanisms Underlying Cross-Modal Phonetic Encoding

Noise Reduction As a Unified Mechanism of Perceptual Learning in Both Artificial and Biological Visual Systems

Noise reduction as a unified mechanism of perceptual learning in humans, macaques, and convolutional neural networks

Development of a Bayesian Estimator for Audio-Visual Integration: A Neurocomputational Study

Auditory Noise Leads to Increased Visual Brain-Computer Interface Performance: A Cross-Modal Study

Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli

EXPRESS: Prior multisensory learning can facilitate auditory-only voice-identity and speech recognition in noise

The neural dynamics of auditory word recognition and integration

Correlation Between Audio–visual Enhancement of Speech in Different Noise Environments and SNR: A Combined Behavioral and Electrophysiological Study

Interference of mid-level sound statistics underlie human speech recognition sensitivity in natural noise

The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers

Towards Modeling the Interaction of Spatial-Associative Neural Network Representations for Multisensory Perception

How Neuronal Noises Influence the Spiking Neural Networks’s Cognitive Learning Process: A Preliminary Study

Navigating Noise: A Study of How Noise Influences Generalisation and Calibration of Neural Networks

Behavioral Response Modeling to Resolve Listener- and Stimulus-Related Influences on Audiovisual Speech Integration in Cochlear Implant Users

Deep Neural Networks Explain Spiking Activity in Auditory Cortex

Quantified Acoustic–optical Speech Signal Incongruity Identifies Cortical Sites of Audiovisual Speech Processing

Hallucination in Perceptual Metric-Driven Speech Enhancement Networks