Silenttalk: Lip Reading Through Ultrasonic Sensing on Mobile Phones

Jiayao Tan,Cam-Tu Nguyen,Xiaoliang Wang
DOI: https://doi.org/10.1109/infocom.2017.8057099
2017-01-01
Abstract:The recently enhanced computing capability and rich sensing functionality on mobile devices lead to the ubiquitous application of speech recognition. Traditional speech recognition records acoustic signals or visual images to interpret speech. However, the acoustic based scheme has many drawbacks. It is easily affected by the environmental noise when users are in the factory or market, and can not be used in a place where people need to be quite such as library. Specifically the current design is not suitable for people with speaking or hearing difficulties. Unfortunately, the visual-based approach is sensitive to fight conditions which shows poor performance in the dark area. As a result, it is necessary to provide an new human-computer interaction channel to assist speech recognition. This paper presents SilentTalk, a non-invasive lip reading system based on ultrasonic Doppler effect The main idea is to generate ultrasonic signals from a mobile phone, then capture the reflections and analyze the fine-grained frequency shift caused by mouth movements. A Frequency Shift Detection Model (FSDM) is proposed to quantify the correlation between frequency variations and mouth movements that form different syllables. SilentTalk then applies a Continuous Lip Reading Model (CLRM) on top of FSDM to realize continuous lip reading. Based on Markov assumption, CLRM effectively combines pronunciation rules and context knowledge to connect isolated syllables to words and sentences. Experiments show that SilentTalk can identify 12 basic mouth motions up to 95.4% accuracy in English. The system can also recognize short sentences up to six words with an average accuracy of 74.8%.
What problem does this paper attempt to address?