Phonological representations of auditory and visual speech in the occipito-temporal cortex and beyond

Alice Van Audenhaege,Stefania Mattioni,Filippo Cerpelloni,Gau Remi,Szmalec Arnaud,Olivier Collignon

DOI: https://doi.org/10.1101/2024.07.25.605084

2024-07-26

Abstract:Speech is a multisensory signal that can be extracted from the voice and the lips. Previous studies suggested that occipital and temporal regions encode both auditory and visual speech features but their precise location and nature remain unclear. We characterized brain activity using fMRI (in male and female) to functionally and individually define bilateral Fusiform Face Areas (FFA), the left Visual Word Form Area (VWFA), an audio-visual speech region in the left Superior Temporal Sulcus (lSTS) and control regions in bilateral Para-hippocampal Place Areas (PPA). In these regions, we performed multivariate patterns classification of corresponding phonemes (speech sounds) and visemes (lip movements). We observed that the VWFA and lSTS represent phonological information from both vision and sounds. The multisensory nature of phonological representations appeared selective to the anterior portion of VWFA, as we found viseme but not phoneme representation in adjacent FFA or even posterior VWFA, while PPA did not encode phonology in any modality. Interestingly, cross-modal decoding revealed aligned phonological representations across the senses in lSTS, but not in VWFA. A whole-brain cross-modal searchlight analysis additionally revealed aligned audio-visual phonological representations in bilateral pSTS and left somato-motor cortex overlapping with oro-facial articulators. Altogether, our results demonstrate that auditory and visual phonology are represented in the anterior VWFA, extending its functional coding beyond orthography. The geometries of auditory and visual representations do not align in the VWFA as they do in the STS and left somato-motor cortex, suggesting distinct multisensory representations across a distributed phonological network.

Biology

What problem does this paper attempt to address?

The paper primarily explores the regions in the brain that process speech information (including auditory and visual speech features) and their specific functions. Researchers used functional magnetic resonance imaging (fMRI) technology to study several key areas in the brain, including the bilateral fusiform face area (FFA), the left visual word form area (VWFA), the left superior temporal sulcus (lSTS), and the bilateral parahippocampal place area (PPA). The core aim is to understand how these regions encode auditory and visual speech signals either separately or jointly. The study found that both the VWFA and lSTS regions can process speech information from sounds and lip movements. However, these two regions use different mechanisms to handle this information: the lSTS can align and integrate auditory and visual speech information at the neural representation level, while the anterior part of the VWFA, although capable of processing both types of information, does not support cross-modal alignment within its internal geometric structure. Additionally, the study discovered that aligned auditory and visual speech information representations also exist in the bilateral superior temporal gyrus (pSTS) and the left somatosensory motor cortex. In summary, the paper aims to address the question of which brain regions can simultaneously process auditory and visual speech information and how these regions handle the information. The findings reveal the specific regions in the brain that process multisensory speech information and their unique encoding mechanisms.

Phonological representations of auditory and visual speech in the occipito-temporal cortex and beyond

Orthographic and Phonological Representations in the Fusiform Cortex.

Neural representation of phonological wordform in bilateral posterior temporal cortex

Demystifying the Visual Word Form Area: Visual and Nonvisual Response Properties of Ventral Temporal Cortex with precision fMRI

Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus

Task Modulates the Orthographic and Phonological Representations in the Bilateral Ventral Occipitotemporal Cortex.

Neural representation of phonological wordform in temporal cortex

Processing communicative facial and vocal cues in the superior temporal sulcus

Functional Preference for Object Sounds and Voices in the Brain of Early Blind and Sighted Individuals

Parallel or sequential? Decoding conceptual and phonological/phonetic information from MEG signals during language production

The Fusiform Face Area is Engaged in Holistic, Not Parts-Based, Representation of Faces.

Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri

Whole-brain dynamics of articulatory, acoustic and semantic speech representations

Phonetic versus spatial processes during motor‐oriented imitations of visuo‐labial and visuo‐lingual speech: A functional near‐infrared spectroscopy study

Congruent brain signatures specific to phonology in fronto-temporal cortex during language production and understanding.

Is the Sensorimotor Cortex Relevant for Speech Perception and Understanding? An Integrative Review

Anatomo-functional correspondence in the voice-selective regions of human prefrontal cortex

Decoding Articulatory Features from fMRI Responses in Dorsal Speech Regions

A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography

The contribution of visual areas to speech comprehension: a PET study in cochlear implants patients and normal-hearing subjects

Visual word processing engages a hierarchical, distributed, and bilateral cortical network