Abstract:Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human-computer interaction, which has a broad development prospect. It is particularly important in a noisy environment and within the hearing- impaired population and is obtained by means of visual information from a video to make up for the deficiency of voice information. This information is a visual language that benefits from Augmented Reality (AR). The purpose is to establish an efficient and convenient way of communication. However, the traditional lip reading recognition system has high requirements of running speed and performance of the equipment because of its long recognition process and large number of parameters, so it is difficult to meet the requirements of practical application. In this paper, the mobile end lip-reading recognition system based on Raspberry Pi is implemented for the first time, and the recognition application has reached the latest level of our research. Our mobile lip-reading recognition system can be divided into three stages: First, we extract key frames from our own independent database, and then use a multi-task cascade convolution network (MTCNN) to correct the face, so as to improve the accuracy of lip extraction. In the second stage, we use MobileNets to extract lip image features and long short-term memory (LSTM) to extract sequence information between key frames. Finally, we compare three lip reading models: (1) The fusion model of Bi-LSTM and AlexNet. (2) A fusion model with attention mechanism. (3) The LSTM and MobileNets hybrid network model proposed by us. The results show that our model has fewer parameters and lower complexity. The accuracy of the model in the test dataset is 86.5%. Therefore, our mobile lip reading system is simpler and smaller than other PC platforms and saves computing resources and memory space.

A Resource-efficient Lip Detector Based on Hybrid Feature Criteria and Predictive Bounding Box Tracking for Mobile HCI Applications

LVID: A Multimodal Biometrics Authentication System on Smartphones.

LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals.

Real-time Pedestrian Crossing Lights Detection Algorithm for the Visually Impaired

Lip Reading-Based User Authentication Through Acoustic Sensing on Smartphones.

Lip Reading Based on 3D Face Modeling and Spatial Transformation Learning

Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi

A data-efficient and easy-to-use lip language interface based on wearable motion capture and speech movement reconstruction

A Real Time Lip Detection Method In Lipreading

Decoding lip language using triboelectric sensors with deep learning

Acoustic-Based Lip Reading for Mobile Devices: Dataset, Benchmark and a Self Distillation-Based Approach

HearMe: Accurate and Real-time Lip Reading based on Commercial RFID Devices

Lip Movement Detection Using 3D Convolution and Resnet

Electromyogram-Based Lip-Reading via Unobtrusive Dry Electrodes and Machine Learning Methods.

An Approach to Robust and Fast Locating of Lip Motion

Research on lip recognition algorithm based on MobileNet + attention-GRU

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Mini-3DCvT: a lightweight lip-reading method based on 3D convolution visual transformer

Silenttalk: Lip Reading Through Ultrasonic Sensing on Mobile Phones

A novel eye movement detection algorithm for EOG driven human computer interface

Lightweight Video-Based Respiration Rate Detection Algorithm: an Application Case on Intensive Care