Abstract:In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking." Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from acoustically rich, naturalistic environments with and without background noise can be generalized to more controlled stimuli. If encoding models for acoustically rich, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations of individuals who may not tolerate listening to more controlled and less engaging stimuli for long periods of time. We recorded noninvasive scalp EEG while 17 human participants (8 male/9 female) listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled datasets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to speech in a rich acoustic background were more accurate when including both phonological and acoustic features. Our findings suggest that naturalistic audiovisual stimuli can be used to measure receptive fields that are comparable and generalizable to more controlled audio-only stimuli.SIGNIFICANCE STATEMENT Understanding spoken language in natural environments requires listeners to parse acoustic and linguistic information in the presence of other distracting stimuli. However, most studies of auditory processing rely on highly controlled stimuli with no background noise, or with background noise inserted at specific times. Here, we compare models where EEG data are predicted based on a combination of acoustic, phonetic, and visual features in highly disparate stimuli-sentences from a speech corpus and speech embedded within movie trailers. We show that modeling neural responses to highly noisy, audiovisual movies can uncover tuning for acoustic and phonetic information that generalizes to simpler stimuli typically used in sensory neuroscience experiments.

Neural Processing of Naturalistic Audiovisual Events in Space and Time

Asymmetrical Cross-Modal Influence on Neural Encoding of Auditory and Visual Features in Natural Scenes.

Emergence of Neural Encoding of Auditory Objects While Listening to Competing Speakers

Naturalistic Audiovisual Stimulation Reveals the Topographic Organization of Human Auditory Cortex

Neural Mechanisms of Audiovisual Integration in Integrated Processing for Verbal Perception and Spatial Factors

Cognitive integration of asynchronous natural or non-natural auditory and visual information in videos of real-world events: an event-related potential study

Multisensory processing of naturalistic objects in motion: a high-density electrical mapping and source estimation study

A hybrid learning framework for fine-grained interpretation of brain spatiotemporal patterns during naturalistic functional magnetic resonance imaging

Early Cortical Connective Network Relating To Audiovisual Stimulation By Partial Directed Coherence Analysis

Towards Modeling the Interaction of Spatial-Associative Neural Network Representations for Multisensory Perception

Characterizing the Time-Varying Brain Networks of Audiovisual Integration across Frequency Bands

The Neural Signature of Spatial Frequency-Based Information Integration in Scene Perception

Inverted encoding of neural responses to audiovisual stimuli reveals super-additive multisensory enhancement

The Time Course Of Spatial And Semantic Processing Between Audition And Vision: Evidence From Event-Related Potentials

Neural Integration of Audiovisual Sensory Inputs in Macaque Amygdala and Adjacent Regions

Neural dSCA: demixing multimodal interaction among brain areas during naturalistic experiments

Audiovisual Integration of Natural Auditory and Visual Stimuli in the Real-World Situation

Audiovisualization of real-time neuroimaging data

Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli

Spatial and Semantic Processing Between Audition and Vision: an Event-Related Potential Study

Semantic Congruent Audiovisual Integration During the Encoding Stage of Working Memory: an ERP and Sloreta Study