Abstract:Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human-computer interaction, which has a broad development prospect. It is particularly important in a noisy environment and within the hearing- impaired population and is obtained by means of visual information from a video to make up for the deficiency of voice information. This information is a visual language that benefits from Augmented Reality (AR). The purpose is to establish an efficient and convenient way of communication. However, the traditional lip reading recognition system has high requirements of running speed and performance of the equipment because of its long recognition process and large number of parameters, so it is difficult to meet the requirements of practical application. In this paper, the mobile end lip-reading recognition system based on Raspberry Pi is implemented for the first time, and the recognition application has reached the latest level of our research. Our mobile lip-reading recognition system can be divided into three stages: First, we extract key frames from our own independent database, and then use a multi-task cascade convolution network (MTCNN) to correct the face, so as to improve the accuracy of lip extraction. In the second stage, we use MobileNets to extract lip image features and long short-term memory (LSTM) to extract sequence information between key frames. Finally, we compare three lip reading models: (1) The fusion model of Bi-LSTM and AlexNet. (2) A fusion model with attention mechanism. (3) The LSTM and MobileNets hybrid network model proposed by us. The results show that our model has fewer parameters and lower complexity. The accuracy of the model in the test dataset is 86.5%. Therefore, our mobile lip reading system is simpler and smaller than other PC platforms and saves computing resources and memory space.

A Lip-Reading Recognition Approach Based on Long Short-Term Memory

Attention Bidirectional LSTM Networks Based Mime Speech Recognition Using Semg Data

Lip Reading Based on 3D Face Modeling and Spatial Transformation Learning

Application of deep learning in Mandarin Chinese lip-reading recognition

LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization

[Lung function in pulmonary sarcoidosis].

Importance-Aware Information Bottleneck Learning Paradigm for Lip Reading

Part-Based Lipreading for Audio-Visual Speech Recognition.

HMM-based Lip Reading with Stingy Residual 3D Convolution

An automatic lip reading for short sentences using deep learning nets

Learn an Effective Lip Reading Model without Pains

Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi

Multi-Grained Spatio-temporal Modeling for Lip-reading

Sign Language Recognition with Long Short-Term Memory.

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

Research on lip recognition algorithm based on MobileNet + attention-GRU

Deep Audio-visual Speech Recognition

Generalizing sentence-level lipreading to unseen speakers: a two-stream end-to-end approach