EarSSR: Silent Speech Recognition via Earphones

Xue Sun,Jie Xiong,Chao Feng,Haoyu Li,Yuli Wu,Dingyi Fang,Xiaojiang Chen
DOI: https://doi.org/10.1109/tmc.2024.3356719
IF: 6.075
2024-01-01
IEEE Transactions on Mobile Computing
Abstract:As the most natural and convenient way to communicate with people, speech is always preferred in Human-Computer Interactions. However, voice-based interaction still has several limitations. It raises privacy concerns in some circumstances and the accuracy severely degrades in noisy environments. To address these limitations, silent speech recognition (SSR) has been proposed, which leverages the inaudible information (e.g., lip movements and throat vibration) to recognize the speech. In this paper, we present EarSSR, an earphone-based silent speech recognition system to enable interaction with human and device without a need for vocalization. The key insight is that when people are speaking, their ear canals exhibit unique deformation patterns and the corresponding deformation patterns are related to words/letters even without any vocalization. We utilize the built-in microphone and speaker of an earphone to capture the ear canal deformation. Ultrasound signals are emitted and the reflected signals are analyzed to extract the signal features corresponding to speech-induced ear canal deformation for silent speech recognition. We design a two-channel hierarchical convolutional neural network to achieve fine-grained letter/word recognition. Our extensive experiments show that EarSSR can achieve an accuracy of 82% for single alphabetic letter recognition and an accuracy of 93% for word recognition.
computer science, information systems,telecommunications
What problem does this paper attempt to address?