End-to-end Neuromorphic Lip Reading

Amélie Gruel,Hugo Bulzomi,Jean Martinet,M. Schweiker
DOI: https://doi.org/10.1109/CVPRW59228.2023.00431
2023-06-01
Abstract:Human speech perception is intrinsically a multi-modal task since speech production requires the speaker to move the lips, producing visual cues in addition to auditory information. Lip reading consists in visually interpreting the movements of the lips to understand speech, without the use of sound. It is an important task since it can either complement an audio-based speech recognition system or replace it when sound is not available. We introduce in this paper a neuromorphic model for lip reading, that uses events produced by an event-based sensor capturing lips motion as input, and that classifies short event sequences in word categories based on a SNN architecture. Experimental results show that the proposed model successfully leverages various advantages of neuromorphic approaches such as energy efficiency and low latency, which are central features in real-time embedded scenarios. To the best of our knowledge, it is the first proposal of an end-to-end neuromorphic lip reading model.
Engineering,Computer Science
What problem does this paper attempt to address?