Abstract:Silent Speech Interface (SSI) have been developed to convert silent articulatory gestures into speech, facilitating silent speech in public spaces and aiding individuals with aphasia. Prior arts of SSI, either relying on wearable devices or cameras, may lead to extended contact requirements or privacy leakage risks. Recent advancements in acoustic sensing offer new opportunitis for gesture sensing. However, they typically focus on content classification rather than on reconstructing audible speech, leading to the loss of crucial speech characteristics such as speech rate, intonation, and emotion. In this paper, we propose UltraSR, a novel sensing system that supports accurate audible speech reconstruction by analyzing the disturbance of tiny articulatory gestures on the reflected ultrasound signal. The design of UltraSR introduces a multi-scale feature extraction scheme for aggregating information from multiple views, and a new model that provides the unique mapping relationship between ultrasound and speech signals, so that the audible speech can be successfully reconstructed from the silent speech. However, establishing the mapping relationship depends on plenty of training data. Instead of the time-consuming collection of massive amounts of data for training, we construct an inverse task that constitutes a dual form with the original task to generate virtual gestures from widely available audio (e.g., phone calls) for facilitating model training. Furthermore, we introduce a fine-tuning mechanism using unlabeled data for user adaptation. We implement UltraSR using a portable smartphone and evaluate it in various environments. The evaluation results show that UltraSR can reconstruct speech with a (Character Error Rate) CER as low as 5.22%, and decrease the CER from 80.13% to 6.31% on new users with only 1 hour of ultrasound signals provided, which outperforms state-of-the-art acoustic-based approaches while preserving rich speech information.

EarSSR: Silent Speech Recognition via Earphones

ReHEarSSE: Recognizing Hidden-in-the-Ear Silently Spelled Expressions

Silent Speech Recognition Based on Surface Electromyography

All-weather, natural silent speech recognition via machine-learning-assisted tattoo-like electronics

Quality-aware Aggregated Conformal Prediction for Silent Speech Recognition

Hybrid Silent Speech Interface Through Fusion of Electroencephalography and Electromyography

Mmear: Push the Limit of COTS Mmwave Eavesdropping on Headphones

Silent Speech Recognition based on sEMG and EEG Signals

Silent Speech Decoding Using Spectrogram Features Based on Neuromuscular Activities

UltraSR: Silent Speech Reconstruction Via Acoustic Sensing

Msilent: Towards General Corpus Silent Speech Recognition Using COTS Mmwave Radar.

Design and implementation of a silent speech recognition system based on sEMG signals: A neural network approach

Silenttalk: Lip Reading Through Ultrasonic Sensing on Mobile Phones

Design and implementation of a speaker recognition system

Silent Speech Recognition Based on Surface Electromyography Using a Few Electrode Sites under the Guidance from High-Density Electrode Arrays.

Decoding Silent Speech Commands from Articulatory Movements Through Soft Magnetic Skin and Machine Learning

Decoding Silent Speech Based on High-Density Surface Electromyogram Using Spatiotemporal Neural Network

Encoder-Decoder Architectures for Silent Speech Recognition Based on High-density Surface Electromyogram

Silent Speech Eyewear Interface: Silent Speech Recognition Method Using Eyewear and an Ear-Mounted Microphone with Infrared Distance Sensors

IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases

SVoice: Enabling Voice Communication in Silence Via Acoustic Sensing on Commodity Devices.