CELIP: Ultrasonic-based Lip Reading with Channel Estimation Approach for Virtual Reality Systems

Yongzhao Zhang,Yi-Chao Chen,Haonan Wang,Xingyu Jin
DOI: https://doi.org/10.1145/3460418.3480163
2021-01-01
Abstract:We developed an ultrasonic-based silent speech interface for Virtual Reality (VR). As more and more customized devices are proposed to enhance the immersion and experience of VR, our system can be used to improve the capability of interactions between users and the systems, while retaining the possibilities of using various customized devices and avoiding some limitations of traditional speech recognition. By employing the channel estimation techniques with ultrasonic waves, we can derive movement characteristics of users’ lips, which can be used to fine-tune existing speech recognition models and augmented by vast open-sourced speech datasets. Moreover, we use the speech interface to guide the initialization of customized models for new users, so that they can easily have the access to our system. A two-stage experiment has been conducted and the results show that our system can achieve 90.8% command-level accuracy and 1.3% word-error-rate in sentence-level accuracy.
What problem does this paper attempt to address?