Abstract:Lip-reading is crucial for understanding speech in challenging conditions. But how the brain extracts meaning from, silent, visual speech is still under debate. Lip-reading in silence activates the auditory cortices, but it is not known whether such activation reflects immediate synthesis of the corresponding auditory stimulus or imagery of unrelated sounds. To disentangle these possibilities, we used magnetoencephalography to evaluate how cortical activity in 28 healthy adult humans (17 females) entrained to the auditory speech envelope and lip movements (mouth opening) when listening to a spoken story without visual input (audio-only), and when seeing a silent video of a speaker articulating another story (video-only). In video-only, auditory cortical activity entrained to the absent auditory signal at frequencies <1 Hz more than to the seen lip movements. This entrainment process was characterized by an auditory-speech-to-brain delay of ~70 ms in the left hemisphere, compared with ~20 ms in audio-only. Entrainment to mouth opening was found in the right angular gyrus at <1 Hz, and in early visual cortices at 1–8 Hz. These findings demonstrate that the brain can use a silent lip-read signal to synthesize a coarse-grained auditory speech representation in early auditory cortices. Our data indicate the following underlying oscillatory mechanism: seeing lip movements first modulates neuronal activity in early visual cortices at frequencies that match articulatory lip movements; the right angular gyrus then extracts slower features of lip movements, mapping them onto the corresponding speech sound features; this information is fed to auditory cortices, most likely facilitating speech parsing. SIGNIFICANCE STATEMENT Lip-reading consists in decoding speech based on visual information derived from observation of a speaker's articulatory facial gestures. Lip-reading is known to improve auditory speech understanding, especially when speech is degraded. Interestingly, lip-reading in silence still activates the auditory cortices, even when participants do not know what the absent auditory signal should be. However, it was uncertain what such activation reflected. Here, using magnetoencephalographic recordings, we demonstrate that it reflects fast synthesis of the auditory stimulus rather than mental imagery of unrelated, speech or non-speech, sounds. Our results also shed light on the oscillatory dynamics underlying lip-reading.

End-to-end Neuromorphic Lip Reading

Electromyogram-Based Lip-Reading via Unobtrusive Dry Electrodes and Machine Learning Methods.

LCSNet: End-to-End Lipreading with Channel-aware Feature Selection

Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi

Lip-Reading with Visual Form Classification using Residual Networks and Bidirectional Gated Recurrent Units

Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech

Visual Words for Automatic Lip-Reading

Intuitive Perception - Speech Recognition using Machine Learning

LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading

Decoding lip language using triboelectric sensors with deep learning

Personalized One-Shot Lipreading for an ALS Patient

Lip Reading using Deep Learning

Lipper: Synthesizing Thy Speech using Multi-View Lipreading

Lip Reading Sentences in the Wild

Lip Reading Using Neural Networks and Deep Learning

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

Sub-word Level Lip Reading With Visual Attention

NeuSpeech: Decode Neural signal as Speech

EMOLIPS: Towards Reliable Emotional Speech Lip-Reading

[Development and evaluation of a deep learning algorithm for German word recognition from lip movements]

FlexLip: A Controllable Text-to-Lip System