Abstract:In a naturalistic environment, auditory cues are often accompanied by information from other senses, which can be redundant with or complementary to the auditory information. Although the multisensory interactions derived from this combination of information and that shape auditory function are seen across all sensory modalities, our greatest body of knowledge to date centers on how vision influences audition. In this review, we attempt to capture the state of our understanding at this point in time regarding this topic. Following a general introduction, the review is divided into 5 sections. In the first section, we review the psychophysical evidence in humans regarding vision's influence in audition, making the distinction between vision's ability to enhance versus alter auditory performance and perception. Three examples are then described that serve to highlight vision's ability to modulate auditory processes: spatial ventriloquism, cross-modal dynamic capture, and the McGurk effect. The final part of this section discusses models that have been built based on available psychophysical data and that seek to provide greater mechanistic insights into how vision can impact audition. The second section reviews the extant neuroimaging and far-field imaging work on this topic, with a strong emphasis on the roles of feedforward and feedback processes, on imaging insights into the causal nature of audiovisual interactions, and on the limitations of current imaging-based approaches. These limitations point to a greater need for machine-learning-based decoding approaches toward understanding how auditory representations are shaped by vision. The third section reviews the wealth of neuroanatomical and neurophysiological data from animal models that highlights audiovisual interactions at the neuronal and circuit level in both subcortical and cortical structures. It also speaks to the functional significance of audiovisual interactions for two critically important facets of auditory perception-scene analysis and communication. The fourth section presents current evidence for alterations in audiovisual processes in three clinical conditions: autism, schizophrenia, and sensorineural hearing loss. These changes in audiovisual interactions are postulated to have cascading effects on higher-order domains of dysfunction in these conditions. The final section highlights ongoing work seeking to leverage our knowledge of audiovisual interactions to develop better remediation approaches to these sensory-based disorders, founded in concepts of perceptual plasticity in which vision has been shown to have the capacity to facilitate auditory learning.

A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts

Asymmetrical Cross-Modal Influence on Neural Encoding of Auditory and Visual Features in Natural Scenes.

Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli

Inverted encoding of neural responses to audiovisual stimuli reveals super-additive multisensory enhancement

Neural Mechanisms Underlying Cross-Modal Phonetic Encoding

Congruent audiovisual speech enhances auditory attention decoding with EEG.

Visual Influences on Auditory Behavioral, Neural, and Perceptual Processes: A Review

Influence of Auditory Cues on the Neuronal Response to Naturalistic Visual Stimuli in a Virtual Reality Setting

EEG Gamma-Band Activity During Audiovisual Speech Comprehension in Different Noise Environments.

Brain Vital Signs: Expanding From the Auditory to Visual Modality

Comparing MEG and EEG measurement set-ups for a brain--computer interface based on selective auditory attention

Congruent Audiovisual Speech Enhances Cortical Envelope Tracking During Auditory Selective Attention.

Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution

Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex

Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods

EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners

Prediction and constraint in audiovisual speech perception

What are we really decoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention

Laminar organization of visual responses in core and parabelt auditory cortex

Neural Processing of Naturalistic Audiovisual Events in Space and Time

Dissociable Neural Correlates of Multisensory Coherence and Selective Attention