Phonological representations of auditory and visual speech in the occipito-temporal cortex and beyond

Alice Van Audenhaege,Stefania Mattioni,Filippo Cerpelloni,Gau Remi,Szmalec Arnaud,Olivier Collignon
DOI: https://doi.org/10.1101/2024.07.25.605084
2024-07-26
Abstract:Speech is a multisensory signal that can be extracted from the voice and the lips. Previous studies suggested that occipital and temporal regions encode both auditory and visual speech features but their precise location and nature remain unclear. We characterized brain activity using fMRI (in male and female) to functionally and individually define bilateral Fusiform Face Areas (FFA), the left Visual Word Form Area (VWFA), an audio-visual speech region in the left Superior Temporal Sulcus (lSTS) and control regions in bilateral Para-hippocampal Place Areas (PPA). In these regions, we performed multivariate patterns classification of corresponding phonemes (speech sounds) and visemes (lip movements). We observed that the VWFA and lSTS represent phonological information from both vision and sounds. The multisensory nature of phonological representations appeared selective to the anterior portion of VWFA, as we found viseme but not phoneme representation in adjacent FFA or even posterior VWFA, while PPA did not encode phonology in any modality. Interestingly, cross-modal decoding revealed aligned phonological representations across the senses in lSTS, but not in VWFA. A whole-brain cross-modal searchlight analysis additionally revealed aligned audio-visual phonological representations in bilateral pSTS and left somato-motor cortex overlapping with oro-facial articulators. Altogether, our results demonstrate that auditory and visual phonology are represented in the anterior VWFA, extending its functional coding beyond orthography. The geometries of auditory and visual representations do not align in the VWFA as they do in the STS and left somato-motor cortex, suggesting distinct multisensory representations across a distributed phonological network.
Biology
What problem does this paper attempt to address?
The paper primarily explores the regions in the brain that process speech information (including auditory and visual speech features) and their specific functions. Researchers used functional magnetic resonance imaging (fMRI) technology to study several key areas in the brain, including the bilateral fusiform face area (FFA), the left visual word form area (VWFA), the left superior temporal sulcus (lSTS), and the bilateral parahippocampal place area (PPA). The core aim is to understand how these regions encode auditory and visual speech signals either separately or jointly. The study found that both the VWFA and lSTS regions can process speech information from sounds and lip movements. However, these two regions use different mechanisms to handle this information: the lSTS can align and integrate auditory and visual speech information at the neural representation level, while the anterior part of the VWFA, although capable of processing both types of information, does not support cross-modal alignment within its internal geometric structure. Additionally, the study discovered that aligned auditory and visual speech information representations also exist in the bilateral superior temporal gyrus (pSTS) and the left somatosensory motor cortex. In summary, the paper aims to address the question of which brain regions can simultaneously process auditory and visual speech information and how these regions handle the information. The findings reveal the specific regions in the brain that process multisensory speech information and their unique encoding mechanisms.