Abstract:Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human-computer interaction, which has a broad development prospect. It is particularly important in a noisy environment and within the hearing- impaired population and is obtained by means of visual information from a video to make up for the deficiency of voice information. This information is a visual language that benefits from Augmented Reality (AR). The purpose is to establish an efficient and convenient way of communication. However, the traditional lip reading recognition system has high requirements of running speed and performance of the equipment because of its long recognition process and large number of parameters, so it is difficult to meet the requirements of practical application. In this paper, the mobile end lip-reading recognition system based on Raspberry Pi is implemented for the first time, and the recognition application has reached the latest level of our research. Our mobile lip-reading recognition system can be divided into three stages: First, we extract key frames from our own independent database, and then use a multi-task cascade convolution network (MTCNN) to correct the face, so as to improve the accuracy of lip extraction. In the second stage, we use MobileNets to extract lip image features and long short-term memory (LSTM) to extract sequence information between key frames. Finally, we compare three lip reading models: (1) The fusion model of Bi-LSTM and AlexNet. (2) A fusion model with attention mechanism. (3) The LSTM and MobileNets hybrid network model proposed by us. The results show that our model has fewer parameters and lower complexity. The accuracy of the model in the test dataset is 86.5%. Therefore, our mobile lip reading system is simpler and smaller than other PC platforms and saves computing resources and memory space.

HearMe: Accurate and Real-time Lip Reading based on Commercial RFID Devices

Lip Reading-Based User Authentication Through Acoustic Sensing on Smartphones.

LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals.

Lip Reading Based on 3D Face Modeling and Spatial Transformation Learning

Sentence-Level Sign Language Recognition Using RF signals

Pushing the limits of remote RF sensing by reading lips under the face mask

Electromyogram-Based Lip-Reading via Unobtrusive Dry Electrodes and Machine Learning Methods.

Artificial intelligence enabled smart mask for speech recognition for future hearing devices

Silenttalk: Lip Reading Through Ultrasonic Sensing on Mobile Phones

A data-efficient and easy-to-use lip language interface based on wearable motion capture and speech movement reconstruction

Acoustic-Based Lip Reading for Mobile Devices: Dataset, Benchmark and a Self Distillation-Based Approach

RFace: Anti-Spoofing Facial Authentication Using COTS RFID

Decoding lip language using triboelectric sensors with deep learning

Microwave Lip Reading of Chinese Mandarin Based on Programmable Metasurface

RF-Sign: Position-Independent Sign Language Recognition Using Passive RFID Tags

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi

Accurate Respiration Monitoring for Mobile Users With Commercial RFID Devices

Microwave Lip Reading Based on Programmable Metasurface Using Deep Learning

All-weather, natural silent speech recognition via machine-learning-assisted tattoo-like electronics

AmbiEar: mmWave Based Voice Recognition in NLoS Scenarios